Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-26 Thread Vlastimil Babka

On 08/26/2015 09:20 AM, Michal Hocko wrote:

On Tue 25-08-15 15:03:00, Eric B Munson wrote:
[...]

Would you drop your objections to the VMA flag if I drop the portions of
the patch that expose it to userspace?

The rework to not use the VMA flag is pretty sizeable and is much more
ugly IMO.  I know that you are not wild about using bit 30 of 32 for
this, but perhaps we can settle on not exporting it to userspace so we
can reclaim it if we really need it in the future?


Yes, that would be definitely more acceptable for me. I do understand
that you are not wild about changing mremap behavior.


+1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-26 Thread Michal Hocko
On Tue 25-08-15 15:03:00, Eric B Munson wrote:
[...]
> Would you drop your objections to the VMA flag if I drop the portions of
> the patch that expose it to userspace?
> 
> The rework to not use the VMA flag is pretty sizeable and is much more
> ugly IMO.  I know that you are not wild about using bit 30 of 32 for
> this, but perhaps we can settle on not exporting it to userspace so we
> can reclaim it if we really need it in the future?

Yes, that would be definitely more acceptable for me. I do understand
that you are not wild about changing mremap behavior.

Anyway, I would really prefer if the vma flag was really used only at
few places - when we are clearing it along with VM_LOCKED (which could
be hidden in VM_LOCKED_CLEAR_MASK or something like that) and when we
decide whether the populate or not (this should be __mm_populate). But
maybe I am missing some call paths where gup is called unconditionally,
I haven't checked that.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-26 Thread Michal Hocko
On Tue 25-08-15 15:03:00, Eric B Munson wrote:
[...]
 Would you drop your objections to the VMA flag if I drop the portions of
 the patch that expose it to userspace?
 
 The rework to not use the VMA flag is pretty sizeable and is much more
 ugly IMO.  I know that you are not wild about using bit 30 of 32 for
 this, but perhaps we can settle on not exporting it to userspace so we
 can reclaim it if we really need it in the future?

Yes, that would be definitely more acceptable for me. I do understand
that you are not wild about changing mremap behavior.

Anyway, I would really prefer if the vma flag was really used only at
few places - when we are clearing it along with VM_LOCKED (which could
be hidden in VM_LOCKED_CLEAR_MASK or something like that) and when we
decide whether the populate or not (this should be __mm_populate). But
maybe I am missing some call paths where gup is called unconditionally,
I haven't checked that.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-26 Thread Vlastimil Babka

On 08/26/2015 09:20 AM, Michal Hocko wrote:

On Tue 25-08-15 15:03:00, Eric B Munson wrote:
[...]

Would you drop your objections to the VMA flag if I drop the portions of
the patch that expose it to userspace?

The rework to not use the VMA flag is pretty sizeable and is much more
ugly IMO.  I know that you are not wild about using bit 30 of 32 for
this, but perhaps we can settle on not exporting it to userspace so we
can reclaim it if we really need it in the future?


Yes, that would be definitely more acceptable for me. I do understand
that you are not wild about changing mremap behavior.


+1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Eric B Munson
On Tue, 25 Aug 2015, Michal Hocko wrote:

> On Tue 25-08-15 10:29:02, Eric B Munson wrote:
> > On Tue, 25 Aug 2015, Michal Hocko wrote:
> [...]
> > > Considering the current behavior I do not thing it would be terrible
> > > thing to do what Konstantin was suggesting and populate only the full
> > > ranges in a best effort mode (it is done so anyway) and document the
> > > behavior properly.
> > > "
> > >If the memory segment specified by old_address and old_size is
> > >locked (using mlock(2) or similar), then this lock is maintained
> > >when the segment is resized and/or relocated. As a consequence,
> > >the amount of memory locked by the process may change.
> > > 
> > >If the range is already fully populated and the range is
> > >enlarged the new range is attempted to be fully populated
> > >as well to preserve the full mlock semantic but there is no
> > >guarantee this will succeed. Partially populated (e.g. created by
> > >mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
> > >so they are not populated on resize.
> > > "
> > 
> > You are proposing that mremap would scan the PTEs as Vlastimil has
> > suggested?
> 
> As Vlastimil pointed out this would be unnecessarily too costly. But I
> am wondering whether we should populate at all during mremap considering
> the full mlock semantic is not guaranteed anyway. Man page mentions only
> that the lock is maintained which will be true without population as
> well.
> 
> If somebody really depends on the current (and broken) implementation we
> can offer MREMAP_POPULATE which would do a best effort population. This
> would be independent on the locked state and would be usable for other
> mappings as well (the usecase would be to save page fault overhead by
> batching them).
> 
> If this would be seen as an unacceptable user visible change of behavior
> then we can go with the VMA flag but I would still prefer to not export
> it to the userspace so that we have a way to change this in future.

Would you drop your objections to the VMA flag if I drop the portions of
the patch that expose it to userspace?

The rework to not use the VMA flag is pretty sizeable and is much more
ugly IMO.  I know that you are not wild about using bit 30 of 32 for
this, but perhaps we can settle on not exporting it to userspace so we
can reclaim it if we really need it in the future?  I can teach the
folks here to check for size vs RSS of the locked mappings for stats on
lock on fault usage so from my point of view, the proc changes are not
necessary.

> -- 
> Michal Hocko
> SUSE Labs


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Tue 25-08-15 10:29:02, Eric B Munson wrote:
> On Tue, 25 Aug 2015, Michal Hocko wrote:
[...]
> > Considering the current behavior I do not thing it would be terrible
> > thing to do what Konstantin was suggesting and populate only the full
> > ranges in a best effort mode (it is done so anyway) and document the
> > behavior properly.
> > "
> >If the memory segment specified by old_address and old_size is
> >locked (using mlock(2) or similar), then this lock is maintained
> >when the segment is resized and/or relocated. As a consequence,
> >the amount of memory locked by the process may change.
> > 
> >If the range is already fully populated and the range is
> >enlarged the new range is attempted to be fully populated
> >as well to preserve the full mlock semantic but there is no
> >guarantee this will succeed. Partially populated (e.g. created by
> >mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
> >so they are not populated on resize.
> > "
> 
> You are proposing that mremap would scan the PTEs as Vlastimil has
> suggested?

As Vlastimil pointed out this would be unnecessarily too costly. But I
am wondering whether we should populate at all during mremap considering
the full mlock semantic is not guaranteed anyway. Man page mentions only
that the lock is maintained which will be true without population as
well.

If somebody really depends on the current (and broken) implementation we
can offer MREMAP_POPULATE which would do a best effort population. This
would be independent on the locked state and would be usable for other
mappings as well (the usecase would be to save page fault overhead by
batching them).

If this would be seen as an unacceptable user visible change of behavior
then we can go with the VMA flag but I would still prefer to not export
it to the userspace so that we have a way to change this in future.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Tue 25-08-15 15:55:46, Vlastimil Babka wrote:
> On 08/25/2015 03:41 PM, Michal Hocko wrote:
[...]
> >So what we have as a result is that partially populated ranges are
> >preserved and fully populated ones work in the best effort mode the same
> >way as they are now.
> >
> >Does that sound at least remotely reasonably?
> 
> I'll basically repeat what I said earlier:
> 
> - mremap scanning existing pte's to figure out the population would slow it
> down for no good reason

So do we really need to populate the enlarged range? All the man page is
saying is that the lock is maintained. Which will be still the case. It
is true that the failure is unlikely (unless you are running in the
memcg) but you cannot rely on the full mlock semantic so what would be a
problem?

> - it would be unreliable anyway:
>   - example: was the area completely populated because MLOCK_ONFAULT was not
> used or because the  process faulted it already

OK, I see this as being a problem. Especially if the buffer is increase
2*original_len

>   - example: was the area not completely populated because MLOCK_ONFAULT was
> used, or because mmap(MAP_LOCKED) failed to populate it fully?

What would be the difference? Both are ONFAULT now.

> I think the first point is a pointless regression for workloads that use
> just plain mlock() and don't want the onfault semantics. Unless there's some
> shortcut? Does vma have a counter of how much is populated? (I don't think
> so?)

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Eric B Munson
On Tue, 25 Aug 2015, Michal Hocko wrote:

> On Fri 21-08-15 14:31:32, Eric B Munson wrote:
> [...]
> > I am in the middle of implementing lock on fault this way, but I cannot
> > see how we will hanlde mremap of a lock on fault region.  Say we have
> > the following:
> > 
> > addr = mmap(len, MAP_ANONYMOUS, ...);
> > mlock(addr, len, MLOCK_ONFAULT);
> > ...
> > mremap(addr, len, 2 * len, ...)
> > 
> > There is no way for mremap to know that the area being remapped was lock
> > on fault so it will be locked and prefaulted by remap.  How can we avoid
> > this without tracking per vma if it was locked with lock or lock on
> > fault?
> 
> Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
> It doesn't guarantee the full mlock semantic because it leaves partially
> populated ranges behind without reporting any error.

This was not my concern.  Instead, I was wondering how to keep lock on
fault sematics with mremap if we do not have a VMA flag.  As a user, it
would surprise me if a region I mlocked with lock on fault and then
remapped to a larger size was fully populated and locked by the mremap
call.

> 
> Considering the current behavior I do not thing it would be terrible
> thing to do what Konstantin was suggesting and populate only the full
> ranges in a best effort mode (it is done so anyway) and document the
> behavior properly.
> "
>If the memory segment specified by old_address and old_size is
>locked (using mlock(2) or similar), then this lock is maintained
>when the segment is resized and/or relocated. As a consequence,
>the amount of memory locked by the process may change.
> 
>If the range is already fully populated and the range is
>enlarged the new range is attempted to be fully populated
>as well to preserve the full mlock semantic but there is no
>guarantee this will succeed. Partially populated (e.g. created by
>mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
>so they are not populated on resize.
> "

You are proposing that mremap would scan the PTEs as Vlastimil has
suggested?

> 
> So what we have as a result is that partially populated ranges are
> preserved and fully populated ones work in the best effort mode the same
> way as they are now.
> 
> Does that sound at least remotely reasonably?
> 
> 
> -- 
> Michal Hocko
> SUSE Labs


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Konstantin Khlebnikov
On Tue, Aug 25, 2015 at 4:41 PM, Michal Hocko  wrote:
> On Fri 21-08-15 14:31:32, Eric B Munson wrote:
> [...]
>> I am in the middle of implementing lock on fault this way, but I cannot
>> see how we will hanlde mremap of a lock on fault region.  Say we have
>> the following:
>>
>> addr = mmap(len, MAP_ANONYMOUS, ...);
>> mlock(addr, len, MLOCK_ONFAULT);
>> ...
>> mremap(addr, len, 2 * len, ...)
>>
>> There is no way for mremap to know that the area being remapped was lock
>> on fault so it will be locked and prefaulted by remap.  How can we avoid
>> this without tracking per vma if it was locked with lock or lock on
>> fault?
>
> Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
> It doesn't guarantee the full mlock semantic because it leaves partially
> populated ranges behind without reporting any error.
>
> Considering the current behavior I do not thing it would be terrible
> thing to do what Konstantin was suggesting and populate only the full
> ranges in a best effort mode (it is done so anyway) and document the
> behavior properly.
> "
>If the memory segment specified by old_address and old_size is
>locked (using mlock(2) or similar), then this lock is maintained
>when the segment is resized and/or relocated. As a consequence,
>the amount of memory locked by the process may change.
>
>If the range is already fully populated and the range is
>enlarged the new range is attempted to be fully populated
>as well to preserve the full mlock semantic but there is no
>guarantee this will succeed. Partially populated (e.g. created by
>mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
>so they are not populated on resize.
> "
>
> So what we have as a result is that partially populated ranges are
> preserved and fully populated ones work in the best effort mode the same
> way as they are now.
>
> Does that sound at least remotely reasonably?

The problem is that mremap have to scan ptes to detect that and old behaviour
becomes very fragile: one fail and mremap will never populate that vma again.
For now I think new flag "MREMAP_NOPOPULATE" is a better option.

>
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Vlastimil Babka

On 08/25/2015 03:41 PM, Michal Hocko wrote:

On Fri 21-08-15 14:31:32, Eric B Munson wrote:
[...]

I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
It doesn't guarantee the full mlock semantic because it leaves partially
populated ranges behind without reporting any error.


Hm, that's right.


Considering the current behavior I do not thing it would be terrible
thing to do what Konstantin was suggesting and populate only the full
ranges in a best effort mode (it is done so anyway) and document the
behavior properly.
"
If the memory segment specified by old_address and old_size is
locked (using mlock(2) or similar), then this lock is maintained
when the segment is resized and/or relocated. As a consequence,
the amount of memory locked by the process may change.

If the range is already fully populated and the range is
enlarged the new range is attempted to be fully populated
as well to preserve the full mlock semantic but there is no
guarantee this will succeed. Partially populated (e.g. created by
mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
so they are not populated on resize.
"

So what we have as a result is that partially populated ranges are
preserved and fully populated ones work in the best effort mode the same
way as they are now.

Does that sound at least remotely reasonably?


I'll basically repeat what I said earlier:

- mremap scanning existing pte's to figure out the population would slow 
it down for no good reason

- it would be unreliable anyway:
  - example: was the area completely populated because MLOCK_ONFAULT 
was not used or because the  process faulted it already
  - example: was the area not completely populated because 
MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to populate 
it fully?


I think the first point is a pointless regression for workloads that use 
just plain mlock() and don't want the onfault semantics. Unless there's 
some shortcut? Does vma have a counter of how much is populated? (I 
don't think so?)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Fri 21-08-15 14:31:32, Eric B Munson wrote:
[...]
> I am in the middle of implementing lock on fault this way, but I cannot
> see how we will hanlde mremap of a lock on fault region.  Say we have
> the following:
> 
> addr = mmap(len, MAP_ANONYMOUS, ...);
> mlock(addr, len, MLOCK_ONFAULT);
> ...
> mremap(addr, len, 2 * len, ...)
> 
> There is no way for mremap to know that the area being remapped was lock
> on fault so it will be locked and prefaulted by remap.  How can we avoid
> this without tracking per vma if it was locked with lock or lock on
> fault?

Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
It doesn't guarantee the full mlock semantic because it leaves partially
populated ranges behind without reporting any error.

Considering the current behavior I do not thing it would be terrible
thing to do what Konstantin was suggesting and populate only the full
ranges in a best effort mode (it is done so anyway) and document the
behavior properly.
"
   If the memory segment specified by old_address and old_size is
   locked (using mlock(2) or similar), then this lock is maintained
   when the segment is resized and/or relocated. As a consequence,
   the amount of memory locked by the process may change.

   If the range is already fully populated and the range is
   enlarged the new range is attempted to be fully populated
   as well to preserve the full mlock semantic but there is no
   guarantee this will succeed. Partially populated (e.g. created by
   mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
   so they are not populated on resize.
"

So what we have as a result is that partially populated ranges are
preserved and fully populated ones work in the best effort mode the same
way as they are now.

Does that sound at least remotely reasonably?


-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Vlastimil Babka

On 08/25/2015 03:41 PM, Michal Hocko wrote:

On Fri 21-08-15 14:31:32, Eric B Munson wrote:
[...]

I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
It doesn't guarantee the full mlock semantic because it leaves partially
populated ranges behind without reporting any error.


Hm, that's right.


Considering the current behavior I do not thing it would be terrible
thing to do what Konstantin was suggesting and populate only the full
ranges in a best effort mode (it is done so anyway) and document the
behavior properly.

If the memory segment specified by old_address and old_size is
locked (using mlock(2) or similar), then this lock is maintained
when the segment is resized and/or relocated. As a consequence,
the amount of memory locked by the process may change.

If the range is already fully populated and the range is
enlarged the new range is attempted to be fully populated
as well to preserve the full mlock semantic but there is no
guarantee this will succeed. Partially populated (e.g. created by
mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
so they are not populated on resize.


So what we have as a result is that partially populated ranges are
preserved and fully populated ones work in the best effort mode the same
way as they are now.

Does that sound at least remotely reasonably?


I'll basically repeat what I said earlier:

- mremap scanning existing pte's to figure out the population would slow 
it down for no good reason

- it would be unreliable anyway:
  - example: was the area completely populated because MLOCK_ONFAULT 
was not used or because the  process faulted it already
  - example: was the area not completely populated because 
MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to populate 
it fully?


I think the first point is a pointless regression for workloads that use 
just plain mlock() and don't want the onfault semantics. Unless there's 
some shortcut? Does vma have a counter of how much is populated? (I 
don't think so?)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Tue 25-08-15 15:55:46, Vlastimil Babka wrote:
 On 08/25/2015 03:41 PM, Michal Hocko wrote:
[...]
 So what we have as a result is that partially populated ranges are
 preserved and fully populated ones work in the best effort mode the same
 way as they are now.
 
 Does that sound at least remotely reasonably?
 
 I'll basically repeat what I said earlier:
 
 - mremap scanning existing pte's to figure out the population would slow it
 down for no good reason

So do we really need to populate the enlarged range? All the man page is
saying is that the lock is maintained. Which will be still the case. It
is true that the failure is unlikely (unless you are running in the
memcg) but you cannot rely on the full mlock semantic so what would be a
problem?

 - it would be unreliable anyway:
   - example: was the area completely populated because MLOCK_ONFAULT was not
 used or because the  process faulted it already

OK, I see this as being a problem. Especially if the buffer is increase
2*original_len

   - example: was the area not completely populated because MLOCK_ONFAULT was
 used, or because mmap(MAP_LOCKED) failed to populate it fully?

What would be the difference? Both are ONFAULT now.

 I think the first point is a pointless regression for workloads that use
 just plain mlock() and don't want the onfault semantics. Unless there's some
 shortcut? Does vma have a counter of how much is populated? (I don't think
 so?)

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Eric B Munson
On Tue, 25 Aug 2015, Michal Hocko wrote:

 On Fri 21-08-15 14:31:32, Eric B Munson wrote:
 [...]
  I am in the middle of implementing lock on fault this way, but I cannot
  see how we will hanlde mremap of a lock on fault region.  Say we have
  the following:
  
  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  mremap(addr, len, 2 * len, ...)
  
  There is no way for mremap to know that the area being remapped was lock
  on fault so it will be locked and prefaulted by remap.  How can we avoid
  this without tracking per vma if it was locked with lock or lock on
  fault?
 
 Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
 It doesn't guarantee the full mlock semantic because it leaves partially
 populated ranges behind without reporting any error.

This was not my concern.  Instead, I was wondering how to keep lock on
fault sematics with mremap if we do not have a VMA flag.  As a user, it
would surprise me if a region I mlocked with lock on fault and then
remapped to a larger size was fully populated and locked by the mremap
call.

 
 Considering the current behavior I do not thing it would be terrible
 thing to do what Konstantin was suggesting and populate only the full
 ranges in a best effort mode (it is done so anyway) and document the
 behavior properly.
 
If the memory segment specified by old_address and old_size is
locked (using mlock(2) or similar), then this lock is maintained
when the segment is resized and/or relocated. As a consequence,
the amount of memory locked by the process may change.
 
If the range is already fully populated and the range is
enlarged the new range is attempted to be fully populated
as well to preserve the full mlock semantic but there is no
guarantee this will succeed. Partially populated (e.g. created by
mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
so they are not populated on resize.
 

You are proposing that mremap would scan the PTEs as Vlastimil has
suggested?

 
 So what we have as a result is that partially populated ranges are
 preserved and fully populated ones work in the best effort mode the same
 way as they are now.
 
 Does that sound at least remotely reasonably?
 
 
 -- 
 Michal Hocko
 SUSE Labs


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Konstantin Khlebnikov
On Tue, Aug 25, 2015 at 4:41 PM, Michal Hocko mho...@kernel.org wrote:
 On Fri 21-08-15 14:31:32, Eric B Munson wrote:
 [...]
 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?

 Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
 It doesn't guarantee the full mlock semantic because it leaves partially
 populated ranges behind without reporting any error.

 Considering the current behavior I do not thing it would be terrible
 thing to do what Konstantin was suggesting and populate only the full
 ranges in a best effort mode (it is done so anyway) and document the
 behavior properly.
 
If the memory segment specified by old_address and old_size is
locked (using mlock(2) or similar), then this lock is maintained
when the segment is resized and/or relocated. As a consequence,
the amount of memory locked by the process may change.

If the range is already fully populated and the range is
enlarged the new range is attempted to be fully populated
as well to preserve the full mlock semantic but there is no
guarantee this will succeed. Partially populated (e.g. created by
mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
so they are not populated on resize.
 

 So what we have as a result is that partially populated ranges are
 preserved and fully populated ones work in the best effort mode the same
 way as they are now.

 Does that sound at least remotely reasonably?

The problem is that mremap have to scan ptes to detect that and old behaviour
becomes very fragile: one fail and mremap will never populate that vma again.
For now I think new flag MREMAP_NOPOPULATE is a better option.



 --
 Michal Hocko
 SUSE Labs

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Eric B Munson
On Tue, 25 Aug 2015, Michal Hocko wrote:

 On Tue 25-08-15 10:29:02, Eric B Munson wrote:
  On Tue, 25 Aug 2015, Michal Hocko wrote:
 [...]
   Considering the current behavior I do not thing it would be terrible
   thing to do what Konstantin was suggesting and populate only the full
   ranges in a best effort mode (it is done so anyway) and document the
   behavior properly.
   
  If the memory segment specified by old_address and old_size is
  locked (using mlock(2) or similar), then this lock is maintained
  when the segment is resized and/or relocated. As a consequence,
  the amount of memory locked by the process may change.
   
  If the range is already fully populated and the range is
  enlarged the new range is attempted to be fully populated
  as well to preserve the full mlock semantic but there is no
  guarantee this will succeed. Partially populated (e.g. created by
  mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
  so they are not populated on resize.
   
  
  You are proposing that mremap would scan the PTEs as Vlastimil has
  suggested?
 
 As Vlastimil pointed out this would be unnecessarily too costly. But I
 am wondering whether we should populate at all during mremap considering
 the full mlock semantic is not guaranteed anyway. Man page mentions only
 that the lock is maintained which will be true without population as
 well.
 
 If somebody really depends on the current (and broken) implementation we
 can offer MREMAP_POPULATE which would do a best effort population. This
 would be independent on the locked state and would be usable for other
 mappings as well (the usecase would be to save page fault overhead by
 batching them).
 
 If this would be seen as an unacceptable user visible change of behavior
 then we can go with the VMA flag but I would still prefer to not export
 it to the userspace so that we have a way to change this in future.

Would you drop your objections to the VMA flag if I drop the portions of
the patch that expose it to userspace?

The rework to not use the VMA flag is pretty sizeable and is much more
ugly IMO.  I know that you are not wild about using bit 30 of 32 for
this, but perhaps we can settle on not exporting it to userspace so we
can reclaim it if we really need it in the future?  I can teach the
folks here to check for size vs RSS of the locked mappings for stats on
lock on fault usage so from my point of view, the proc changes are not
necessary.

 -- 
 Michal Hocko
 SUSE Labs


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Tue 25-08-15 10:29:02, Eric B Munson wrote:
 On Tue, 25 Aug 2015, Michal Hocko wrote:
[...]
  Considering the current behavior I do not thing it would be terrible
  thing to do what Konstantin was suggesting and populate only the full
  ranges in a best effort mode (it is done so anyway) and document the
  behavior properly.
  
 If the memory segment specified by old_address and old_size is
 locked (using mlock(2) or similar), then this lock is maintained
 when the segment is resized and/or relocated. As a consequence,
 the amount of memory locked by the process may change.
  
 If the range is already fully populated and the range is
 enlarged the new range is attempted to be fully populated
 as well to preserve the full mlock semantic but there is no
 guarantee this will succeed. Partially populated (e.g. created by
 mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
 so they are not populated on resize.
  
 
 You are proposing that mremap would scan the PTEs as Vlastimil has
 suggested?

As Vlastimil pointed out this would be unnecessarily too costly. But I
am wondering whether we should populate at all during mremap considering
the full mlock semantic is not guaranteed anyway. Man page mentions only
that the lock is maintained which will be true without population as
well.

If somebody really depends on the current (and broken) implementation we
can offer MREMAP_POPULATE which would do a best effort population. This
would be independent on the locked state and would be usable for other
mappings as well (the usecase would be to save page fault overhead by
batching them).

If this would be seen as an unacceptable user visible change of behavior
then we can go with the VMA flag but I would still prefer to not export
it to the userspace so that we have a way to change this in future.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-25 Thread Michal Hocko
On Fri 21-08-15 14:31:32, Eric B Munson wrote:
[...]
 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:
 
 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)
 
 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?

Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED).
It doesn't guarantee the full mlock semantic because it leaves partially
populated ranges behind without reporting any error.

Considering the current behavior I do not thing it would be terrible
thing to do what Konstantin was suggesting and populate only the full
ranges in a best effort mode (it is done so anyway) and document the
behavior properly.

   If the memory segment specified by old_address and old_size is
   locked (using mlock(2) or similar), then this lock is maintained
   when the segment is resized and/or relocated. As a consequence,
   the amount of memory locked by the process may change.

   If the range is already fully populated and the range is
   enlarged the new range is attempted to be fully populated
   as well to preserve the full mlock semantic but there is no
   guarantee this will succeed. Partially populated (e.g. created by
   mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
   so they are not populated on resize.


So what we have as a result is that partially populated ranges are
preserved and fully populated ones work in the best effort mode the same
way as they are now.

Does that sound at least remotely reasonably?


-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

> On Mon, Aug 24, 2015 at 8:00 PM, Eric B Munson  wrote:
> > On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
> >
> >> On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson  wrote:
> >> > On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
> >> >
> >> >> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  
> >> >> wrote:
> >> >> > On Mon, 24 Aug 2015, Vlastimil Babka wrote:
> >> >> >
> >> >> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
> >> >> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  
> >> >> >> >wrote:
> >> >> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
> >> >> >> 
> >> >> >> 
> >> >> >> I am in the middle of implementing lock on fault this way, but I 
> >> >> >> cannot
> >> >> >> see how we will hanlde mremap of a lock on fault region.  Say we 
> >> >> >> have
> >> >> >> the following:
> >> >> >> 
> >> >> >>   addr = mmap(len, MAP_ANONYMOUS, ...);
> >> >> >>   mlock(addr, len, MLOCK_ONFAULT);
> >> >> >>   ...
> >> >> >>   mremap(addr, len, 2 * len, ...)
> >> >> >> 
> >> >> >> There is no way for mremap to know that the area being remapped 
> >> >> >> was lock
> >> >> >> on fault so it will be locked and prefaulted by remap.  How can 
> >> >> >> we avoid
> >> >> >> this without tracking per vma if it was locked with lock or lock 
> >> >> >> on
> >> >> >> fault?
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>remap can count filled ptes and prefault only completely 
> >> >> >> >>>populated areas.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>Does (and should) mremap really prefault non-present pages? 
> >> >> >> >>Shouldn't it
> >> >> >> >>just prepare the page tables and that's it?
> >> >> >> >
> >> >> >> >As I see mremap prefaults pages when it extends mlocked area.
> >> >> >> >
> >> >> >> >Also quote from manpage
> >> >> >> >: If  the memory segment specified by old_address and old_size is 
> >> >> >> >locked
> >> >> >> >: (using mlock(2) or similar), then this lock is maintained when 
> >> >> >> >the segment is
> >> >> >> >: resized and/or relocated.  As a  consequence, the amount of 
> >> >> >> >memory locked
> >> >> >> >: by the process may change.
> >> >> >>
> >> >> >> Oh, right... Well that looks like a convincing argument for having a
> >> >> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
> >> >> >> existing pte's would slow it down, and be unreliable (was the area
> >> >> >> completely populated because MLOCK_ONFAULT was not used or because
> >> >> >> the process aulted it already? Was it not populated because
> >> >> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
> >> >> >> populate it all?).
> >> >> >
> >> >> > Given this, I am going to stop working in v8 and leave the vma flag in
> >> >> > place.
> >> >> >
> >> >> >>
> >> >> >> The only sane alternative is to populate always for mremap() of
> >> >> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
> >> >> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
> >> >> >> be enough for Eric's usecase, but it's somewhat ugly.
> >> >> >>
> >> >> >
> >> >> > I don't think that this is the right solution, I would be really
> >> >> > surprised as a user if an area I locked with MLOCK_ONFAULT was then
> >> >> > fully locked and prepopulated after mremap().
> >> >>
> >> >> If mremap is the only problem then we can add opposite flag for it:
> >> >>
> >> >> "MREMAP_NOPOPULATE"
> >> >> - do not populate new segment of locked areas
> >> >> - do not copy normal areas if possible (anonymous/special must be 
> >> >> copied)
> >> >>
> >> >> addr = mmap(len, MAP_ANONYMOUS, ...);
> >> >> mlock(addr, len, MLOCK_ONFAULT);
> >> >> ...
> >> >> addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
> >> >> ...
> >> >>
> >> >
> >> > But with this, the user must remember what areas are locked with
> >> > MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
> >> > correct mremap flags can be used.
> >> >
> >>
> >> Yep. Shouldn't be hard. You anyway have to do some changes in user-space.
> >>
> >
> > Sorry if I wasn't clear enough in my last reply, I think forcing
> > userspace to track this is the wrong choice.  The VM system is
> > responsible for tracking these attributes and should continue to be.
> 
> Userspace tracks addresses and sizes of these areas. Plus mremap obviously
> works only with page granularity so memory allocator in userspace have to know
> a lot about these structures. So keeping one more bit isn't a rocket science.
> 

Fair enough, however, my current implementation does not require that
userspace keep track of any extra information.  With the VM_LOCKONFAULT
flag mremap() keeps the properties that were set with mlock() or
equivalent across remaps.

> >
> >>
> >> Much simpler for users-pace solution is a mm-wide flag which turns all 
> >> further
> >> mlocks and MAP_LOCKED into lock-on-fault. 

Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 8:00 PM, Eric B Munson  wrote:
> On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
>
>> On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson  wrote:
>> > On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
>> >
>> >> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  wrote:
>> >> > On Mon, 24 Aug 2015, Vlastimil Babka wrote:
>> >> >
>> >> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
>> >> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  
>> >> >> >wrote:
>> >> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
>> >> >> 
>> >> >> 
>> >> >> I am in the middle of implementing lock on fault this way, but I 
>> >> >> cannot
>> >> >> see how we will hanlde mremap of a lock on fault region.  Say we 
>> >> >> have
>> >> >> the following:
>> >> >> 
>> >> >>   addr = mmap(len, MAP_ANONYMOUS, ...);
>> >> >>   mlock(addr, len, MLOCK_ONFAULT);
>> >> >>   ...
>> >> >>   mremap(addr, len, 2 * len, ...)
>> >> >> 
>> >> >> There is no way for mremap to know that the area being remapped 
>> >> >> was lock
>> >> >> on fault so it will be locked and prefaulted by remap.  How can we 
>> >> >> avoid
>> >> >> this without tracking per vma if it was locked with lock or lock on
>> >> >> fault?
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>remap can count filled ptes and prefault only completely populated 
>> >> >> >>>areas.
>> >> >> >>
>> >> >> >>
>> >> >> >>Does (and should) mremap really prefault non-present pages? 
>> >> >> >>Shouldn't it
>> >> >> >>just prepare the page tables and that's it?
>> >> >> >
>> >> >> >As I see mremap prefaults pages when it extends mlocked area.
>> >> >> >
>> >> >> >Also quote from manpage
>> >> >> >: If  the memory segment specified by old_address and old_size is 
>> >> >> >locked
>> >> >> >: (using mlock(2) or similar), then this lock is maintained when the 
>> >> >> >segment is
>> >> >> >: resized and/or relocated.  As a  consequence, the amount of memory 
>> >> >> >locked
>> >> >> >: by the process may change.
>> >> >>
>> >> >> Oh, right... Well that looks like a convincing argument for having a
>> >> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
>> >> >> existing pte's would slow it down, and be unreliable (was the area
>> >> >> completely populated because MLOCK_ONFAULT was not used or because
>> >> >> the process aulted it already? Was it not populated because
>> >> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
>> >> >> populate it all?).
>> >> >
>> >> > Given this, I am going to stop working in v8 and leave the vma flag in
>> >> > place.
>> >> >
>> >> >>
>> >> >> The only sane alternative is to populate always for mremap() of
>> >> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
>> >> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
>> >> >> be enough for Eric's usecase, but it's somewhat ugly.
>> >> >>
>> >> >
>> >> > I don't think that this is the right solution, I would be really
>> >> > surprised as a user if an area I locked with MLOCK_ONFAULT was then
>> >> > fully locked and prepopulated after mremap().
>> >>
>> >> If mremap is the only problem then we can add opposite flag for it:
>> >>
>> >> "MREMAP_NOPOPULATE"
>> >> - do not populate new segment of locked areas
>> >> - do not copy normal areas if possible (anonymous/special must be copied)
>> >>
>> >> addr = mmap(len, MAP_ANONYMOUS, ...);
>> >> mlock(addr, len, MLOCK_ONFAULT);
>> >> ...
>> >> addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
>> >> ...
>> >>
>> >
>> > But with this, the user must remember what areas are locked with
>> > MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
>> > correct mremap flags can be used.
>> >
>>
>> Yep. Shouldn't be hard. You anyway have to do some changes in user-space.
>>
>
> Sorry if I wasn't clear enough in my last reply, I think forcing
> userspace to track this is the wrong choice.  The VM system is
> responsible for tracking these attributes and should continue to be.

Userspace tracks addresses and sizes of these areas. Plus mremap obviously
works only with page granularity so memory allocator in userspace have to know
a lot about these structures. So keeping one more bit isn't a rocket science.

>
>>
>> Much simpler for users-pace solution is a mm-wide flag which turns all 
>> further
>> mlocks and MAP_LOCKED into lock-on-fault. Something like
>> mlockall(MCL_NOPOPULATE_LOCKED).
>
> This set certainly adds the foundation for such a change if you think it
> would be useful.  That particular behavior was not part of my inital use
> case though.
>

This looks like much easier solution: you don't need new syscall and after
enabling that lock-on-fault mode userspace still can get old behaviour simply
by touching newly locked area.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

> On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson  wrote:
> > On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
> >
> >> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  wrote:
> >> > On Mon, 24 Aug 2015, Vlastimil Babka wrote:
> >> >
> >> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
> >> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  
> >> >> >wrote:
> >> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
> >> >> 
> >> >> 
> >> >> I am in the middle of implementing lock on fault this way, but I 
> >> >> cannot
> >> >> see how we will hanlde mremap of a lock on fault region.  Say we 
> >> >> have
> >> >> the following:
> >> >> 
> >> >>   addr = mmap(len, MAP_ANONYMOUS, ...);
> >> >>   mlock(addr, len, MLOCK_ONFAULT);
> >> >>   ...
> >> >>   mremap(addr, len, 2 * len, ...)
> >> >> 
> >> >> There is no way for mremap to know that the area being remapped was 
> >> >> lock
> >> >> on fault so it will be locked and prefaulted by remap.  How can we 
> >> >> avoid
> >> >> this without tracking per vma if it was locked with lock or lock on
> >> >> fault?
> >> >> >>>
> >> >> >>>
> >> >> >>>remap can count filled ptes and prefault only completely populated 
> >> >> >>>areas.
> >> >> >>
> >> >> >>
> >> >> >>Does (and should) mremap really prefault non-present pages? Shouldn't 
> >> >> >>it
> >> >> >>just prepare the page tables and that's it?
> >> >> >
> >> >> >As I see mremap prefaults pages when it extends mlocked area.
> >> >> >
> >> >> >Also quote from manpage
> >> >> >: If  the memory segment specified by old_address and old_size is 
> >> >> >locked
> >> >> >: (using mlock(2) or similar), then this lock is maintained when the 
> >> >> >segment is
> >> >> >: resized and/or relocated.  As a  consequence, the amount of memory 
> >> >> >locked
> >> >> >: by the process may change.
> >> >>
> >> >> Oh, right... Well that looks like a convincing argument for having a
> >> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
> >> >> existing pte's would slow it down, and be unreliable (was the area
> >> >> completely populated because MLOCK_ONFAULT was not used or because
> >> >> the process aulted it already? Was it not populated because
> >> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
> >> >> populate it all?).
> >> >
> >> > Given this, I am going to stop working in v8 and leave the vma flag in
> >> > place.
> >> >
> >> >>
> >> >> The only sane alternative is to populate always for mremap() of
> >> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
> >> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
> >> >> be enough for Eric's usecase, but it's somewhat ugly.
> >> >>
> >> >
> >> > I don't think that this is the right solution, I would be really
> >> > surprised as a user if an area I locked with MLOCK_ONFAULT was then
> >> > fully locked and prepopulated after mremap().
> >>
> >> If mremap is the only problem then we can add opposite flag for it:
> >>
> >> "MREMAP_NOPOPULATE"
> >> - do not populate new segment of locked areas
> >> - do not copy normal areas if possible (anonymous/special must be copied)
> >>
> >> addr = mmap(len, MAP_ANONYMOUS, ...);
> >> mlock(addr, len, MLOCK_ONFAULT);
> >> ...
> >> addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
> >> ...
> >>
> >
> > But with this, the user must remember what areas are locked with
> > MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
> > correct mremap flags can be used.
> >
> 
> Yep. Shouldn't be hard. You anyway have to do some changes in user-space.
> 

Sorry if I wasn't clear enough in my last reply, I think forcing
userspace to track this is the wrong choice.  The VM system is
responsible for tracking these attributes and should continue to be.

> 
> Much simpler for users-pace solution is a mm-wide flag which turns all further
> mlocks and MAP_LOCKED into lock-on-fault. Something like
> mlockall(MCL_NOPOPULATE_LOCKED).

This set certainly adds the foundation for such a change if you think it
would be useful.  That particular behavior was not part of my inital use
case though.



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson  wrote:
> On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
>
>> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  wrote:
>> > On Mon, 24 Aug 2015, Vlastimil Babka wrote:
>> >
>> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
>> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:
>> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
>> >> 
>> >> 
>> >> I am in the middle of implementing lock on fault this way, but I 
>> >> cannot
>> >> see how we will hanlde mremap of a lock on fault region.  Say we have
>> >> the following:
>> >> 
>> >>   addr = mmap(len, MAP_ANONYMOUS, ...);
>> >>   mlock(addr, len, MLOCK_ONFAULT);
>> >>   ...
>> >>   mremap(addr, len, 2 * len, ...)
>> >> 
>> >> There is no way for mremap to know that the area being remapped was 
>> >> lock
>> >> on fault so it will be locked and prefaulted by remap.  How can we 
>> >> avoid
>> >> this without tracking per vma if it was locked with lock or lock on
>> >> fault?
>> >> >>>
>> >> >>>
>> >> >>>remap can count filled ptes and prefault only completely populated 
>> >> >>>areas.
>> >> >>
>> >> >>
>> >> >>Does (and should) mremap really prefault non-present pages? Shouldn't it
>> >> >>just prepare the page tables and that's it?
>> >> >
>> >> >As I see mremap prefaults pages when it extends mlocked area.
>> >> >
>> >> >Also quote from manpage
>> >> >: If  the memory segment specified by old_address and old_size is locked
>> >> >: (using mlock(2) or similar), then this lock is maintained when the 
>> >> >segment is
>> >> >: resized and/or relocated.  As a  consequence, the amount of memory 
>> >> >locked
>> >> >: by the process may change.
>> >>
>> >> Oh, right... Well that looks like a convincing argument for having a
>> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
>> >> existing pte's would slow it down, and be unreliable (was the area
>> >> completely populated because MLOCK_ONFAULT was not used or because
>> >> the process aulted it already? Was it not populated because
>> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
>> >> populate it all?).
>> >
>> > Given this, I am going to stop working in v8 and leave the vma flag in
>> > place.
>> >
>> >>
>> >> The only sane alternative is to populate always for mremap() of
>> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
>> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
>> >> be enough for Eric's usecase, but it's somewhat ugly.
>> >>
>> >
>> > I don't think that this is the right solution, I would be really
>> > surprised as a user if an area I locked with MLOCK_ONFAULT was then
>> > fully locked and prepopulated after mremap().
>>
>> If mremap is the only problem then we can add opposite flag for it:
>>
>> "MREMAP_NOPOPULATE"
>> - do not populate new segment of locked areas
>> - do not copy normal areas if possible (anonymous/special must be copied)
>>
>> addr = mmap(len, MAP_ANONYMOUS, ...);
>> mlock(addr, len, MLOCK_ONFAULT);
>> ...
>> addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
>> ...
>>
>
> But with this, the user must remember what areas are locked with
> MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
> correct mremap flags can be used.
>

Yep. Shouldn't be hard. You anyway have to do some changes in user-space.


Much simpler for users-pace solution is a mm-wide flag which turns all further
mlocks and MAP_LOCKED into lock-on-fault. Something like
mlockall(MCL_NOPOPULATE_LOCKED).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  wrote:
> > On Mon, 24 Aug 2015, Vlastimil Babka wrote:
> >
> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:
> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
> >> 
> >> 
> >> I am in the middle of implementing lock on fault this way, but I cannot
> >> see how we will hanlde mremap of a lock on fault region.  Say we have
> >> the following:
> >> 
> >>   addr = mmap(len, MAP_ANONYMOUS, ...);
> >>   mlock(addr, len, MLOCK_ONFAULT);
> >>   ...
> >>   mremap(addr, len, 2 * len, ...)
> >> 
> >> There is no way for mremap to know that the area being remapped was 
> >> lock
> >> on fault so it will be locked and prefaulted by remap.  How can we 
> >> avoid
> >> this without tracking per vma if it was locked with lock or lock on
> >> fault?
> >> >>>
> >> >>>
> >> >>>remap can count filled ptes and prefault only completely populated 
> >> >>>areas.
> >> >>
> >> >>
> >> >>Does (and should) mremap really prefault non-present pages? Shouldn't it
> >> >>just prepare the page tables and that's it?
> >> >
> >> >As I see mremap prefaults pages when it extends mlocked area.
> >> >
> >> >Also quote from manpage
> >> >: If  the memory segment specified by old_address and old_size is locked
> >> >: (using mlock(2) or similar), then this lock is maintained when the 
> >> >segment is
> >> >: resized and/or relocated.  As a  consequence, the amount of memory 
> >> >locked
> >> >: by the process may change.
> >>
> >> Oh, right... Well that looks like a convincing argument for having a
> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
> >> existing pte's would slow it down, and be unreliable (was the area
> >> completely populated because MLOCK_ONFAULT was not used or because
> >> the process aulted it already? Was it not populated because
> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
> >> populate it all?).
> >
> > Given this, I am going to stop working in v8 and leave the vma flag in
> > place.
> >
> >>
> >> The only sane alternative is to populate always for mremap() of
> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
> >> be enough for Eric's usecase, but it's somewhat ugly.
> >>
> >
> > I don't think that this is the right solution, I would be really
> > surprised as a user if an area I locked with MLOCK_ONFAULT was then
> > fully locked and prepopulated after mremap().
> 
> If mremap is the only problem then we can add opposite flag for it:
> 
> "MREMAP_NOPOPULATE"
> - do not populate new segment of locked areas
> - do not copy normal areas if possible (anonymous/special must be copied)
> 
> addr = mmap(len, MAP_ANONYMOUS, ...);
> mlock(addr, len, MLOCK_ONFAULT);
> ...
> addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
> ...
> 

But with this, the user must remember what areas are locked with
MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
correct mremap flags can be used.



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson  wrote:
> On Mon, 24 Aug 2015, Vlastimil Babka wrote:
>
>> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
>> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:
>> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
>> 
>> 
>> I am in the middle of implementing lock on fault this way, but I cannot
>> see how we will hanlde mremap of a lock on fault region.  Say we have
>> the following:
>> 
>>   addr = mmap(len, MAP_ANONYMOUS, ...);
>>   mlock(addr, len, MLOCK_ONFAULT);
>>   ...
>>   mremap(addr, len, 2 * len, ...)
>> 
>> There is no way for mremap to know that the area being remapped was lock
>> on fault so it will be locked and prefaulted by remap.  How can we avoid
>> this without tracking per vma if it was locked with lock or lock on
>> fault?
>> >>>
>> >>>
>> >>>remap can count filled ptes and prefault only completely populated areas.
>> >>
>> >>
>> >>Does (and should) mremap really prefault non-present pages? Shouldn't it
>> >>just prepare the page tables and that's it?
>> >
>> >As I see mremap prefaults pages when it extends mlocked area.
>> >
>> >Also quote from manpage
>> >: If  the memory segment specified by old_address and old_size is locked
>> >: (using mlock(2) or similar), then this lock is maintained when the 
>> >segment is
>> >: resized and/or relocated.  As a  consequence, the amount of memory locked
>> >: by the process may change.
>>
>> Oh, right... Well that looks like a convincing argument for having a
>> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
>> existing pte's would slow it down, and be unreliable (was the area
>> completely populated because MLOCK_ONFAULT was not used or because
>> the process aulted it already? Was it not populated because
>> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
>> populate it all?).
>
> Given this, I am going to stop working in v8 and leave the vma flag in
> place.
>
>>
>> The only sane alternative is to populate always for mremap() of
>> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
>> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
>> be enough for Eric's usecase, but it's somewhat ugly.
>>
>
> I don't think that this is the right solution, I would be really
> surprised as a user if an area I locked with MLOCK_ONFAULT was then
> fully locked and prepopulated after mremap().

If mremap is the only problem then we can add opposite flag for it:

"MREMAP_NOPOPULATE"
- do not populate new segment of locked areas
- do not copy normal areas if possible (anonymous/special must be copied)

addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
...

>
>> >>
>> >>>There might be a problem after failed populate: remap will handle them
>> >>>as lock on fault. In this case we can fill ptes with swap-like non-present
>> >>>entries to remember that fact and count them as should-be-locked pages.
>> >>
>> >>
>> >>I don't think we should strive to have mremap try to fix the inherent
>> >>unreliability of mmap (MAP_POPULATE)?
>> >
>> >I don't think so. MAP_POPULATE works only when mmap happens.
>> >Flag MREMAP_POPULATE might be a good idea. Just for symmetry.
>>
>> Maybe, but please do it as a separate series.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Vlastimil Babka wrote:

> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:
> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
> 
> 
> I am in the middle of implementing lock on fault this way, but I cannot
> see how we will hanlde mremap of a lock on fault region.  Say we have
> the following:
> 
>   addr = mmap(len, MAP_ANONYMOUS, ...);
>   mlock(addr, len, MLOCK_ONFAULT);
>   ...
>   mremap(addr, len, 2 * len, ...)
> 
> There is no way for mremap to know that the area being remapped was lock
> on fault so it will be locked and prefaulted by remap.  How can we avoid
> this without tracking per vma if it was locked with lock or lock on
> fault?
> >>>
> >>>
> >>>remap can count filled ptes and prefault only completely populated areas.
> >>
> >>
> >>Does (and should) mremap really prefault non-present pages? Shouldn't it
> >>just prepare the page tables and that's it?
> >
> >As I see mremap prefaults pages when it extends mlocked area.
> >
> >Also quote from manpage
> >: If  the memory segment specified by old_address and old_size is locked
> >: (using mlock(2) or similar), then this lock is maintained when the segment 
> >is
> >: resized and/or relocated.  As a  consequence, the amount of memory locked
> >: by the process may change.
> 
> Oh, right... Well that looks like a convincing argument for having a
> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
> existing pte's would slow it down, and be unreliable (was the area
> completely populated because MLOCK_ONFAULT was not used or because
> the process aulted it already? Was it not populated because
> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
> populate it all?).

Given this, I am going to stop working in v8 and leave the vma flag in
place.

> 
> The only sane alternative is to populate always for mremap() of
> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
> be enough for Eric's usecase, but it's somewhat ugly.
> 

I don't think that this is the right solution, I would be really
surprised as a user if an area I locked with MLOCK_ONFAULT was then
fully locked and prepopulated after mremap().

> >>
> >>>There might be a problem after failed populate: remap will handle them
> >>>as lock on fault. In this case we can fill ptes with swap-like non-present
> >>>entries to remember that fact and count them as should-be-locked pages.
> >>
> >>
> >>I don't think we should strive to have mremap try to fix the inherent
> >>unreliability of mmap (MAP_POPULATE)?
> >
> >I don't think so. MAP_POPULATE works only when mmap happens.
> >Flag MREMAP_POPULATE might be a good idea. Just for symmetry.
> 
> Maybe, but please do it as a separate series.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Vlastimil Babka

On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:

On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:

On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:



I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?



remap can count filled ptes and prefault only completely populated areas.



Does (and should) mremap really prefault non-present pages? Shouldn't it
just prepare the page tables and that's it?


As I see mremap prefaults pages when it extends mlocked area.

Also quote from manpage
: If  the memory segment specified by old_address and old_size is locked
: (using mlock(2) or similar), then this lock is maintained when the segment is
: resized and/or relocated.  As a  consequence, the amount of memory locked
: by the process may change.


Oh, right... Well that looks like a convincing argument for having a 
sticky VM_LOCKONFAULT after all. Having mremap guess by scanning 
existing pte's would slow it down, and be unreliable (was the area 
completely populated because MLOCK_ONFAULT was not used or because the 
process aulted it already? Was it not populated because MLOCK_ONFAULT 
was used, or because mmap(MAP_LOCKED) failed to populate it all?).


The only sane alternative is to populate always for mremap() of 
VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information as 
a limitation of mlock2(MLOCK_ONFAULT). Which might or might not be 
enough for Eric's usecase, but it's somewhat ugly.





There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.



I don't think we should strive to have mremap try to fix the inherent
unreliability of mmap (MAP_POPULATE)?


I don't think so. MAP_POPULATE works only when mmap happens.
Flag MREMAP_POPULATE might be a good idea. Just for symmetry.


Maybe, but please do it as a separate series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka  wrote:
> On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
>>>
>>>
>>> I am in the middle of implementing lock on fault this way, but I cannot
>>> see how we will hanlde mremap of a lock on fault region.  Say we have
>>> the following:
>>>
>>>  addr = mmap(len, MAP_ANONYMOUS, ...);
>>>  mlock(addr, len, MLOCK_ONFAULT);
>>>  ...
>>>  mremap(addr, len, 2 * len, ...)
>>>
>>> There is no way for mremap to know that the area being remapped was lock
>>> on fault so it will be locked and prefaulted by remap.  How can we avoid
>>> this without tracking per vma if it was locked with lock or lock on
>>> fault?
>>
>>
>> remap can count filled ptes and prefault only completely populated areas.
>
>
> Does (and should) mremap really prefault non-present pages? Shouldn't it
> just prepare the page tables and that's it?

As I see mremap prefaults pages when it extends mlocked area.

Also quote from manpage
: If  the memory segment specified by old_address and old_size is locked
: (using mlock(2) or similar), then this lock is maintained when the segment is
: resized and/or relocated.  As a  consequence, the amount of memory locked
: by the process may change.

>
>> There might be a problem after failed populate: remap will handle them
>> as lock on fault. In this case we can fill ptes with swap-like non-present
>> entries to remember that fact and count them as should-be-locked pages.
>
>
> I don't think we should strive to have mremap try to fix the inherent
> unreliability of mmap (MAP_POPULATE)?

I don't think so. MAP_POPULATE works only when mmap happens.
Flag MREMAP_POPULATE might be a good idea. Just for symmetry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Vlastimil Babka

On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:


I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


remap can count filled ptes and prefault only completely populated areas.


Does (and should) mremap really prefault non-present pages? Shouldn't it 
just prepare the page tables and that's it?



There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.


I don't think we should strive to have mremap try to fix the inherent 
unreliability of mmap (MAP_POPULATE)?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Fri, Aug 21, 2015 at 9:31 PM, Eric B Munson  wrote:
> On Fri, 21 Aug 2015, Michal Hocko wrote:
>
>> On Thu 20-08-15 13:03:09, Eric B Munson wrote:
>> > On Thu, 20 Aug 2015, Michal Hocko wrote:
>> >
>> > > On Wed 19-08-15 17:33:45, Eric B Munson wrote:
>> > > [...]
>> > > > The group which asked for this feature here
>> > > > wants the ability to distinguish between LOCKED and LOCKONFAULT regions
>> > > > and without the VMA flag there isn't a way to do that.
>> > >
>> > > Could you be more specific on why this is needed?
>> >
>> > They want to keep metrics on the amount of memory used in a LOCKONFAULT
>> > region versus the address space of the region.
>>
>> /proc//smaps already exports that information AFAICS. It exports
>> VMA flags including VM_LOCKED and if rss < size then this is clearly
>> LOCKONFAULT because the standard mlock semantic is to populate. Would
>> that be sufficient?
>>
>> Now, it is true that LOCKONFAULT wouldn't be distinguishable from
>> MAP_LOCKED which failed to populate but does that really matter? It is
>> LOCKONFAULT in a way as well.
>
> Does that matter to my users?  No, they do not use MAP_LOCKED at all so
> any VMA with VM_LOCKED set and rss < size is lock on fault.  Will it
> matter to others?  I suspect so, but these are likely to be the same
> group of users which will be suprised to learn that MAP_LOCKED does not
> guarantee that the entire range is faulted in on return from mmap.
>
>>
>> > > > Do we know that these last two open flags are needed right now or is
>> > > > this speculation that they will be and that none of the other VMA flags
>> > > > can be reclaimed?
>> > >
>> > > I do not think they are needed by anybody right now but that is not a
>> > > reason why it should be used without a really strong justification.
>> > > If the discoverability is really needed then fair enough but I haven't
>> > > seen any justification for that yet.
>> >
>> > To be completely clear you believe that if the metrics collection is
>> > not a strong enough justification, it is better to expand the mm_struct
>> > by another unsigned long than to use one of these bits right?
>>
>> A simple bool is sufficient for that. And yes I think we should go with
>> per mm_struct flag rather than the additional vma flag if it has only
>> the global (whole address space) scope - which would be the case if the
>> LOCKONFAULT is always an mlock modifier and the persistance is needed
>> only for MCL_FUTURE. Which is imho a sane semantic.
>
> I am in the middle of implementing lock on fault this way, but I cannot
> see how we will hanlde mremap of a lock on fault region.  Say we have
> the following:
>
> addr = mmap(len, MAP_ANONYMOUS, ...);
> mlock(addr, len, MLOCK_ONFAULT);
> ...
> mremap(addr, len, 2 * len, ...)
>
> There is no way for mremap to know that the area being remapped was lock
> on fault so it will be locked and prefaulted by remap.  How can we avoid
> this without tracking per vma if it was locked with lock or lock on
> fault?

remap can count filled ptes and prefault only completely populated areas.

There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Vlastimil Babka

On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:


I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


remap can count filled ptes and prefault only completely populated areas.


Does (and should) mremap really prefault non-present pages? Shouldn't it 
just prepare the page tables and that's it?



There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.


I don't think we should strive to have mremap try to fix the inherent 
unreliability of mmap (MAP_POPULATE)?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:
 On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:


 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:

  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  mremap(addr, len, 2 * len, ...)

 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?


 remap can count filled ptes and prefault only completely populated areas.


 Does (and should) mremap really prefault non-present pages? Shouldn't it
 just prepare the page tables and that's it?

As I see mremap prefaults pages when it extends mlocked area.

Also quote from manpage
: If  the memory segment specified by old_address and old_size is locked
: (using mlock(2) or similar), then this lock is maintained when the segment is
: resized and/or relocated.  As a  consequence, the amount of memory locked
: by the process may change.


 There might be a problem after failed populate: remap will handle them
 as lock on fault. In this case we can fill ptes with swap-like non-present
 entries to remember that fact and count them as should-be-locked pages.


 I don't think we should strive to have mremap try to fix the inherent
 unreliability of mmap (MAP_POPULATE)?

I don't think so. MAP_POPULATE works only when mmap happens.
Flag MREMAP_POPULATE might be a good idea. Just for symmetry.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Vlastimil Babka wrote:

 On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
 On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:
 On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
 
 
 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:
 
   addr = mmap(len, MAP_ANONYMOUS, ...);
   mlock(addr, len, MLOCK_ONFAULT);
   ...
   mremap(addr, len, 2 * len, ...)
 
 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?
 
 
 remap can count filled ptes and prefault only completely populated areas.
 
 
 Does (and should) mremap really prefault non-present pages? Shouldn't it
 just prepare the page tables and that's it?
 
 As I see mremap prefaults pages when it extends mlocked area.
 
 Also quote from manpage
 : If  the memory segment specified by old_address and old_size is locked
 : (using mlock(2) or similar), then this lock is maintained when the segment 
 is
 : resized and/or relocated.  As a  consequence, the amount of memory locked
 : by the process may change.
 
 Oh, right... Well that looks like a convincing argument for having a
 sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
 existing pte's would slow it down, and be unreliable (was the area
 completely populated because MLOCK_ONFAULT was not used or because
 the process aulted it already? Was it not populated because
 MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
 populate it all?).

Given this, I am going to stop working in v8 and leave the vma flag in
place.

 
 The only sane alternative is to populate always for mremap() of
 VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
 as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
 be enough for Eric's usecase, but it's somewhat ugly.
 

I don't think that this is the right solution, I would be really
surprised as a user if an area I locked with MLOCK_ONFAULT was then
fully locked and prepopulated after mremap().

 
 There might be a problem after failed populate: remap will handle them
 as lock on fault. In this case we can fill ptes with swap-like non-present
 entries to remember that fact and count them as should-be-locked pages.
 
 
 I don't think we should strive to have mremap try to fix the inherent
 unreliability of mmap (MAP_POPULATE)?
 
 I don't think so. MAP_POPULATE works only when mmap happens.
 Flag MREMAP_POPULATE might be a good idea. Just for symmetry.
 
 Maybe, but please do it as a separate series.
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Vlastimil Babka

On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:

On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:

On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:



I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?



remap can count filled ptes and prefault only completely populated areas.



Does (and should) mremap really prefault non-present pages? Shouldn't it
just prepare the page tables and that's it?


As I see mremap prefaults pages when it extends mlocked area.

Also quote from manpage
: If  the memory segment specified by old_address and old_size is locked
: (using mlock(2) or similar), then this lock is maintained when the segment is
: resized and/or relocated.  As a  consequence, the amount of memory locked
: by the process may change.


Oh, right... Well that looks like a convincing argument for having a 
sticky VM_LOCKONFAULT after all. Having mremap guess by scanning 
existing pte's would slow it down, and be unreliable (was the area 
completely populated because MLOCK_ONFAULT was not used or because the 
process aulted it already? Was it not populated because MLOCK_ONFAULT 
was used, or because mmap(MAP_LOCKED) failed to populate it all?).


The only sane alternative is to populate always for mremap() of 
VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information as 
a limitation of mlock2(MLOCK_ONFAULT). Which might or might not be 
enough for Eric's usecase, but it's somewhat ugly.





There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.



I don't think we should strive to have mremap try to fix the inherent
unreliability of mmap (MAP_POPULATE)?


I don't think so. MAP_POPULATE works only when mmap happens.
Flag MREMAP_POPULATE might be a good idea. Just for symmetry.


Maybe, but please do it as a separate series.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com wrote:
 On Mon, 24 Aug 2015, Vlastimil Babka wrote:

 On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
 On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:
 On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
 
 
 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:
 
   addr = mmap(len, MAP_ANONYMOUS, ...);
   mlock(addr, len, MLOCK_ONFAULT);
   ...
   mremap(addr, len, 2 * len, ...)
 
 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?
 
 
 remap can count filled ptes and prefault only completely populated areas.
 
 
 Does (and should) mremap really prefault non-present pages? Shouldn't it
 just prepare the page tables and that's it?
 
 As I see mremap prefaults pages when it extends mlocked area.
 
 Also quote from manpage
 : If  the memory segment specified by old_address and old_size is locked
 : (using mlock(2) or similar), then this lock is maintained when the 
 segment is
 : resized and/or relocated.  As a  consequence, the amount of memory locked
 : by the process may change.

 Oh, right... Well that looks like a convincing argument for having a
 sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
 existing pte's would slow it down, and be unreliable (was the area
 completely populated because MLOCK_ONFAULT was not used or because
 the process aulted it already? Was it not populated because
 MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
 populate it all?).

 Given this, I am going to stop working in v8 and leave the vma flag in
 place.


 The only sane alternative is to populate always for mremap() of
 VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
 as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
 be enough for Eric's usecase, but it's somewhat ugly.


 I don't think that this is the right solution, I would be really
 surprised as a user if an area I locked with MLOCK_ONFAULT was then
 fully locked and prepopulated after mremap().

If mremap is the only problem then we can add opposite flag for it:

MREMAP_NOPOPULATE
- do not populate new segment of locked areas
- do not copy normal areas if possible (anonymous/special must be copied)

addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
...


 
 There might be a problem after failed populate: remap will handle them
 as lock on fault. In this case we can fill ptes with swap-like non-present
 entries to remember that fact and count them as should-be-locked pages.
 
 
 I don't think we should strive to have mremap try to fix the inherent
 unreliability of mmap (MAP_POPULATE)?
 
 I don't think so. MAP_POPULATE works only when mmap happens.
 Flag MREMAP_POPULATE might be a good idea. Just for symmetry.

 Maybe, but please do it as a separate series.

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

 On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com wrote:
  On Mon, 24 Aug 2015, Vlastimil Babka wrote:
 
  On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
  On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:
  On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
  
  
  I am in the middle of implementing lock on fault this way, but I cannot
  see how we will hanlde mremap of a lock on fault region.  Say we have
  the following:
  
addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
mremap(addr, len, 2 * len, ...)
  
  There is no way for mremap to know that the area being remapped was 
  lock
  on fault so it will be locked and prefaulted by remap.  How can we 
  avoid
  this without tracking per vma if it was locked with lock or lock on
  fault?
  
  
  remap can count filled ptes and prefault only completely populated 
  areas.
  
  
  Does (and should) mremap really prefault non-present pages? Shouldn't it
  just prepare the page tables and that's it?
  
  As I see mremap prefaults pages when it extends mlocked area.
  
  Also quote from manpage
  : If  the memory segment specified by old_address and old_size is locked
  : (using mlock(2) or similar), then this lock is maintained when the 
  segment is
  : resized and/or relocated.  As a  consequence, the amount of memory 
  locked
  : by the process may change.
 
  Oh, right... Well that looks like a convincing argument for having a
  sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
  existing pte's would slow it down, and be unreliable (was the area
  completely populated because MLOCK_ONFAULT was not used or because
  the process aulted it already? Was it not populated because
  MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
  populate it all?).
 
  Given this, I am going to stop working in v8 and leave the vma flag in
  place.
 
 
  The only sane alternative is to populate always for mremap() of
  VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
  as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
  be enough for Eric's usecase, but it's somewhat ugly.
 
 
  I don't think that this is the right solution, I would be really
  surprised as a user if an area I locked with MLOCK_ONFAULT was then
  fully locked and prepopulated after mremap().
 
 If mremap is the only problem then we can add opposite flag for it:
 
 MREMAP_NOPOPULATE
 - do not populate new segment of locked areas
 - do not copy normal areas if possible (anonymous/special must be copied)
 
 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
 ...
 

But with this, the user must remember what areas are locked with
MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
correct mremap flags can be used.



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson emun...@akamai.com wrote:
 On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

 On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com wrote:
  On Mon, 24 Aug 2015, Vlastimil Babka wrote:
 
  On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
  On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz wrote:
  On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
  
  
  I am in the middle of implementing lock on fault this way, but I 
  cannot
  see how we will hanlde mremap of a lock on fault region.  Say we have
  the following:
  
addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
mremap(addr, len, 2 * len, ...)
  
  There is no way for mremap to know that the area being remapped was 
  lock
  on fault so it will be locked and prefaulted by remap.  How can we 
  avoid
  this without tracking per vma if it was locked with lock or lock on
  fault?
  
  
  remap can count filled ptes and prefault only completely populated 
  areas.
  
  
  Does (and should) mremap really prefault non-present pages? Shouldn't it
  just prepare the page tables and that's it?
  
  As I see mremap prefaults pages when it extends mlocked area.
  
  Also quote from manpage
  : If  the memory segment specified by old_address and old_size is locked
  : (using mlock(2) or similar), then this lock is maintained when the 
  segment is
  : resized and/or relocated.  As a  consequence, the amount of memory 
  locked
  : by the process may change.
 
  Oh, right... Well that looks like a convincing argument for having a
  sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
  existing pte's would slow it down, and be unreliable (was the area
  completely populated because MLOCK_ONFAULT was not used or because
  the process aulted it already? Was it not populated because
  MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
  populate it all?).
 
  Given this, I am going to stop working in v8 and leave the vma flag in
  place.
 
 
  The only sane alternative is to populate always for mremap() of
  VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
  as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
  be enough for Eric's usecase, but it's somewhat ugly.
 
 
  I don't think that this is the right solution, I would be really
  surprised as a user if an area I locked with MLOCK_ONFAULT was then
  fully locked and prepopulated after mremap().

 If mremap is the only problem then we can add opposite flag for it:

 MREMAP_NOPOPULATE
 - do not populate new segment of locked areas
 - do not copy normal areas if possible (anonymous/special must be copied)

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
 ...


 But with this, the user must remember what areas are locked with
 MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
 correct mremap flags can be used.


Yep. Shouldn't be hard. You anyway have to do some changes in user-space.


Much simpler for users-pace solution is a mm-wide flag which turns all further
mlocks and MAP_LOCKED into lock-on-fault. Something like
mlockall(MCL_NOPOPULATE_LOCKED).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

 On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson emun...@akamai.com wrote:
  On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
 
  On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com wrote:
   On Mon, 24 Aug 2015, Vlastimil Babka wrote:
  
   On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
   On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz 
   wrote:
   On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
   
   
   I am in the middle of implementing lock on fault this way, but I 
   cannot
   see how we will hanlde mremap of a lock on fault region.  Say we 
   have
   the following:
   
 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)
   
   There is no way for mremap to know that the area being remapped was 
   lock
   on fault so it will be locked and prefaulted by remap.  How can we 
   avoid
   this without tracking per vma if it was locked with lock or lock on
   fault?
   
   
   remap can count filled ptes and prefault only completely populated 
   areas.
   
   
   Does (and should) mremap really prefault non-present pages? Shouldn't 
   it
   just prepare the page tables and that's it?
   
   As I see mremap prefaults pages when it extends mlocked area.
   
   Also quote from manpage
   : If  the memory segment specified by old_address and old_size is 
   locked
   : (using mlock(2) or similar), then this lock is maintained when the 
   segment is
   : resized and/or relocated.  As a  consequence, the amount of memory 
   locked
   : by the process may change.
  
   Oh, right... Well that looks like a convincing argument for having a
   sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
   existing pte's would slow it down, and be unreliable (was the area
   completely populated because MLOCK_ONFAULT was not used or because
   the process aulted it already? Was it not populated because
   MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
   populate it all?).
  
   Given this, I am going to stop working in v8 and leave the vma flag in
   place.
  
  
   The only sane alternative is to populate always for mremap() of
   VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
   as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
   be enough for Eric's usecase, but it's somewhat ugly.
  
  
   I don't think that this is the right solution, I would be really
   surprised as a user if an area I locked with MLOCK_ONFAULT was then
   fully locked and prepopulated after mremap().
 
  If mremap is the only problem then we can add opposite flag for it:
 
  MREMAP_NOPOPULATE
  - do not populate new segment of locked areas
  - do not copy normal areas if possible (anonymous/special must be copied)
 
  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
  ...
 
 
  But with this, the user must remember what areas are locked with
  MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
  correct mremap flags can be used.
 
 
 Yep. Shouldn't be hard. You anyway have to do some changes in user-space.
 

Sorry if I wasn't clear enough in my last reply, I think forcing
userspace to track this is the wrong choice.  The VM system is
responsible for tracking these attributes and should continue to be.

 
 Much simpler for users-pace solution is a mm-wide flag which turns all further
 mlocks and MAP_LOCKED into lock-on-fault. Something like
 mlockall(MCL_NOPOPULATE_LOCKED).

This set certainly adds the foundation for such a change if you think it
would be useful.  That particular behavior was not part of my inital use
case though.



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Mon, Aug 24, 2015 at 8:00 PM, Eric B Munson emun...@akamai.com wrote:
 On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

 On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson emun...@akamai.com wrote:
  On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
 
  On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com wrote:
   On Mon, 24 Aug 2015, Vlastimil Babka wrote:
  
   On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
   On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz 
   wrote:
   On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:
   
   
   I am in the middle of implementing lock on fault this way, but I 
   cannot
   see how we will hanlde mremap of a lock on fault region.  Say we 
   have
   the following:
   
 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)
   
   There is no way for mremap to know that the area being remapped 
   was lock
   on fault so it will be locked and prefaulted by remap.  How can we 
   avoid
   this without tracking per vma if it was locked with lock or lock on
   fault?
   
   
   remap can count filled ptes and prefault only completely populated 
   areas.
   
   
   Does (and should) mremap really prefault non-present pages? 
   Shouldn't it
   just prepare the page tables and that's it?
   
   As I see mremap prefaults pages when it extends mlocked area.
   
   Also quote from manpage
   : If  the memory segment specified by old_address and old_size is 
   locked
   : (using mlock(2) or similar), then this lock is maintained when the 
   segment is
   : resized and/or relocated.  As a  consequence, the amount of memory 
   locked
   : by the process may change.
  
   Oh, right... Well that looks like a convincing argument for having a
   sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
   existing pte's would slow it down, and be unreliable (was the area
   completely populated because MLOCK_ONFAULT was not used or because
   the process aulted it already? Was it not populated because
   MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
   populate it all?).
  
   Given this, I am going to stop working in v8 and leave the vma flag in
   place.
  
  
   The only sane alternative is to populate always for mremap() of
   VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
   as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
   be enough for Eric's usecase, but it's somewhat ugly.
  
  
   I don't think that this is the right solution, I would be really
   surprised as a user if an area I locked with MLOCK_ONFAULT was then
   fully locked and prepopulated after mremap().
 
  If mremap is the only problem then we can add opposite flag for it:
 
  MREMAP_NOPOPULATE
  - do not populate new segment of locked areas
  - do not copy normal areas if possible (anonymous/special must be copied)
 
  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
  ...
 
 
  But with this, the user must remember what areas are locked with
  MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
  correct mremap flags can be used.
 

 Yep. Shouldn't be hard. You anyway have to do some changes in user-space.


 Sorry if I wasn't clear enough in my last reply, I think forcing
 userspace to track this is the wrong choice.  The VM system is
 responsible for tracking these attributes and should continue to be.

Userspace tracks addresses and sizes of these areas. Plus mremap obviously
works only with page granularity so memory allocator in userspace have to know
a lot about these structures. So keeping one more bit isn't a rocket science.



 Much simpler for users-pace solution is a mm-wide flag which turns all 
 further
 mlocks and MAP_LOCKED into lock-on-fault. Something like
 mlockall(MCL_NOPOPULATE_LOCKED).

 This set certainly adds the foundation for such a change if you think it
 would be useful.  That particular behavior was not part of my inital use
 case though.


This looks like much easier solution: you don't need new syscall and after
enabling that lock-on-fault mode userspace still can get old behaviour simply
by touching newly locked area.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Konstantin Khlebnikov
On Fri, Aug 21, 2015 at 9:31 PM, Eric B Munson emun...@akamai.com wrote:
 On Fri, 21 Aug 2015, Michal Hocko wrote:

 On Thu 20-08-15 13:03:09, Eric B Munson wrote:
  On Thu, 20 Aug 2015, Michal Hocko wrote:
 
   On Wed 19-08-15 17:33:45, Eric B Munson wrote:
   [...]
The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.
  
   Could you be more specific on why this is needed?
 
  They want to keep metrics on the amount of memory used in a LOCKONFAULT
  region versus the address space of the region.

 /proc/pid/smaps already exports that information AFAICS. It exports
 VMA flags including VM_LOCKED and if rss  size then this is clearly
 LOCKONFAULT because the standard mlock semantic is to populate. Would
 that be sufficient?

 Now, it is true that LOCKONFAULT wouldn't be distinguishable from
 MAP_LOCKED which failed to populate but does that really matter? It is
 LOCKONFAULT in a way as well.

 Does that matter to my users?  No, they do not use MAP_LOCKED at all so
 any VMA with VM_LOCKED set and rss  size is lock on fault.  Will it
 matter to others?  I suspect so, but these are likely to be the same
 group of users which will be suprised to learn that MAP_LOCKED does not
 guarantee that the entire range is faulted in on return from mmap.


Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?
  
   I do not think they are needed by anybody right now but that is not a
   reason why it should be used without a really strong justification.
   If the discoverability is really needed then fair enough but I haven't
   seen any justification for that yet.
 
  To be completely clear you believe that if the metrics collection is
  not a strong enough justification, it is better to expand the mm_struct
  by another unsigned long than to use one of these bits right?

 A simple bool is sufficient for that. And yes I think we should go with
 per mm_struct flag rather than the additional vma flag if it has only
 the global (whole address space) scope - which would be the case if the
 LOCKONFAULT is always an mlock modifier and the persistance is needed
 only for MCL_FUTURE. Which is imho a sane semantic.

 I am in the middle of implementing lock on fault this way, but I cannot
 see how we will hanlde mremap of a lock on fault region.  Say we have
 the following:

 addr = mmap(len, MAP_ANONYMOUS, ...);
 mlock(addr, len, MLOCK_ONFAULT);
 ...
 mremap(addr, len, 2 * len, ...)

 There is no way for mremap to know that the area being remapped was lock
 on fault so it will be locked and prefaulted by remap.  How can we avoid
 this without tracking per vma if it was locked with lock or lock on
 fault?

remap can count filled ptes and prefault only completely populated areas.

There might be a problem after failed populate: remap will handle them
as lock on fault. In this case we can fill ptes with swap-like non-present
entries to remember that fact and count them as should-be-locked pages.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-24 Thread Eric B Munson
On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:

 On Mon, Aug 24, 2015 at 8:00 PM, Eric B Munson emun...@akamai.com wrote:
  On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
 
  On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson emun...@akamai.com wrote:
   On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote:
  
   On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson emun...@akamai.com 
   wrote:
On Mon, 24 Aug 2015, Vlastimil Babka wrote:
   
On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote:
On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka vba...@suse.cz 
wrote:
On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote:


I am in the middle of implementing lock on fault this way, but I 
cannot
see how we will hanlde mremap of a lock on fault region.  Say we 
have
the following:

  addr = mmap(len, MAP_ANONYMOUS, ...);
  mlock(addr, len, MLOCK_ONFAULT);
  ...
  mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped 
was lock
on fault so it will be locked and prefaulted by remap.  How can 
we avoid
this without tracking per vma if it was locked with lock or lock 
on
fault?


remap can count filled ptes and prefault only completely 
populated areas.


Does (and should) mremap really prefault non-present pages? 
Shouldn't it
just prepare the page tables and that's it?

As I see mremap prefaults pages when it extends mlocked area.

Also quote from manpage
: If  the memory segment specified by old_address and old_size is 
locked
: (using mlock(2) or similar), then this lock is maintained when 
the segment is
: resized and/or relocated.  As a  consequence, the amount of 
memory locked
: by the process may change.
   
Oh, right... Well that looks like a convincing argument for having a
sticky VM_LOCKONFAULT after all. Having mremap guess by scanning
existing pte's would slow it down, and be unreliable (was the area
completely populated because MLOCK_ONFAULT was not used or because
the process aulted it already? Was it not populated because
MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to
populate it all?).
   
Given this, I am going to stop working in v8 and leave the vma flag in
place.
   
   
The only sane alternative is to populate always for mremap() of
VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information
as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not
be enough for Eric's usecase, but it's somewhat ugly.
   
   
I don't think that this is the right solution, I would be really
surprised as a user if an area I locked with MLOCK_ONFAULT was then
fully locked and prepopulated after mremap().
  
   If mremap is the only problem then we can add opposite flag for it:
  
   MREMAP_NOPOPULATE
   - do not populate new segment of locked areas
   - do not copy normal areas if possible (anonymous/special must be 
   copied)
  
   addr = mmap(len, MAP_ANONYMOUS, ...);
   mlock(addr, len, MLOCK_ONFAULT);
   ...
   addr2 = mremap(addr, len, 2 * len, MREMAP_NOPOPULATE);
   ...
  
  
   But with this, the user must remember what areas are locked with
   MLOCK_LOCKONFAULT and which are locked the with prepopulate so the
   correct mremap flags can be used.
  
 
  Yep. Shouldn't be hard. You anyway have to do some changes in user-space.
 
 
  Sorry if I wasn't clear enough in my last reply, I think forcing
  userspace to track this is the wrong choice.  The VM system is
  responsible for tracking these attributes and should continue to be.
 
 Userspace tracks addresses and sizes of these areas. Plus mremap obviously
 works only with page granularity so memory allocator in userspace have to know
 a lot about these structures. So keeping one more bit isn't a rocket science.
 

Fair enough, however, my current implementation does not require that
userspace keep track of any extra information.  With the VM_LOCKONFAULT
flag mremap() keeps the properties that were set with mlock() or
equivalent across remaps.

 
 
  Much simpler for users-pace solution is a mm-wide flag which turns all 
  further
  mlocks and MAP_LOCKED into lock-on-fault. Something like
  mlockall(MCL_NOPOPULATE_LOCKED).
 
  This set certainly adds the foundation for such a change if you think it
  would be useful.  That particular behavior was not part of my inital use
  case though.
 
 
 This looks like much easier solution: you don't need new syscall and after
 enabling that lock-on-fault mode userspace still can get old behaviour simply
 by touching newly locked area.

Again, this suggestion requires that userspace know more about VM than
with my implementation and will require it to walk an entire mapping
before use to fault it in if required.  With the current implementation,
mlock continues to function as it has, with the 

Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-21 Thread Eric B Munson
On Fri, 21 Aug 2015, Michal Hocko wrote:

> On Thu 20-08-15 13:03:09, Eric B Munson wrote:
> > On Thu, 20 Aug 2015, Michal Hocko wrote:
> > 
> > > On Wed 19-08-15 17:33:45, Eric B Munson wrote:
> > > [...]
> > > > The group which asked for this feature here
> > > > wants the ability to distinguish between LOCKED and LOCKONFAULT regions
> > > > and without the VMA flag there isn't a way to do that.
> > > 
> > > Could you be more specific on why this is needed?
> > 
> > They want to keep metrics on the amount of memory used in a LOCKONFAULT
> > region versus the address space of the region.
> 
> /proc//smaps already exports that information AFAICS. It exports
> VMA flags including VM_LOCKED and if rss < size then this is clearly
> LOCKONFAULT because the standard mlock semantic is to populate. Would
> that be sufficient?
> 
> Now, it is true that LOCKONFAULT wouldn't be distinguishable from
> MAP_LOCKED which failed to populate but does that really matter? It is
> LOCKONFAULT in a way as well.

Does that matter to my users?  No, they do not use MAP_LOCKED at all so
any VMA with VM_LOCKED set and rss < size is lock on fault.  Will it
matter to others?  I suspect so, but these are likely to be the same
group of users which will be suprised to learn that MAP_LOCKED does not
guarantee that the entire range is faulted in on return from mmap.

> 
> > > > Do we know that these last two open flags are needed right now or is
> > > > this speculation that they will be and that none of the other VMA flags
> > > > can be reclaimed?
> > > 
> > > I do not think they are needed by anybody right now but that is not a
> > > reason why it should be used without a really strong justification.
> > > If the discoverability is really needed then fair enough but I haven't
> > > seen any justification for that yet.
> > 
> > To be completely clear you believe that if the metrics collection is
> > not a strong enough justification, it is better to expand the mm_struct
> > by another unsigned long than to use one of these bits right?
> 
> A simple bool is sufficient for that. And yes I think we should go with
> per mm_struct flag rather than the additional vma flag if it has only
> the global (whole address space) scope - which would be the case if the
> LOCKONFAULT is always an mlock modifier and the persistance is needed
> only for MCL_FUTURE. Which is imho a sane semantic.

I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-21 Thread Michal Hocko
On Thu 20-08-15 13:03:09, Eric B Munson wrote:
> On Thu, 20 Aug 2015, Michal Hocko wrote:
> 
> > On Wed 19-08-15 17:33:45, Eric B Munson wrote:
> > [...]
> > > The group which asked for this feature here
> > > wants the ability to distinguish between LOCKED and LOCKONFAULT regions
> > > and without the VMA flag there isn't a way to do that.
> > 
> > Could you be more specific on why this is needed?
> 
> They want to keep metrics on the amount of memory used in a LOCKONFAULT
> region versus the address space of the region.

/proc//smaps already exports that information AFAICS. It exports
VMA flags including VM_LOCKED and if rss < size then this is clearly
LOCKONFAULT because the standard mlock semantic is to populate. Would
that be sufficient?

Now, it is true that LOCKONFAULT wouldn't be distinguishable from
MAP_LOCKED which failed to populate but does that really matter? It is
LOCKONFAULT in a way as well.

> > > Do we know that these last two open flags are needed right now or is
> > > this speculation that they will be and that none of the other VMA flags
> > > can be reclaimed?
> > 
> > I do not think they are needed by anybody right now but that is not a
> > reason why it should be used without a really strong justification.
> > If the discoverability is really needed then fair enough but I haven't
> > seen any justification for that yet.
> 
> To be completely clear you believe that if the metrics collection is
> not a strong enough justification, it is better to expand the mm_struct
> by another unsigned long than to use one of these bits right?

A simple bool is sufficient for that. And yes I think we should go with
per mm_struct flag rather than the additional vma flag if it has only
the global (whole address space) scope - which would be the case if the
LOCKONFAULT is always an mlock modifier and the persistance is needed
only for MCL_FUTURE. Which is imho a sane semantic.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-21 Thread Eric B Munson
On Fri, 21 Aug 2015, Michal Hocko wrote:

 On Thu 20-08-15 13:03:09, Eric B Munson wrote:
  On Thu, 20 Aug 2015, Michal Hocko wrote:
  
   On Wed 19-08-15 17:33:45, Eric B Munson wrote:
   [...]
The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.
   
   Could you be more specific on why this is needed?
  
  They want to keep metrics on the amount of memory used in a LOCKONFAULT
  region versus the address space of the region.
 
 /proc/pid/smaps already exports that information AFAICS. It exports
 VMA flags including VM_LOCKED and if rss  size then this is clearly
 LOCKONFAULT because the standard mlock semantic is to populate. Would
 that be sufficient?
 
 Now, it is true that LOCKONFAULT wouldn't be distinguishable from
 MAP_LOCKED which failed to populate but does that really matter? It is
 LOCKONFAULT in a way as well.

Does that matter to my users?  No, they do not use MAP_LOCKED at all so
any VMA with VM_LOCKED set and rss  size is lock on fault.  Will it
matter to others?  I suspect so, but these are likely to be the same
group of users which will be suprised to learn that MAP_LOCKED does not
guarantee that the entire range is faulted in on return from mmap.

 
Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?
   
   I do not think they are needed by anybody right now but that is not a
   reason why it should be used without a really strong justification.
   If the discoverability is really needed then fair enough but I haven't
   seen any justification for that yet.
  
  To be completely clear you believe that if the metrics collection is
  not a strong enough justification, it is better to expand the mm_struct
  by another unsigned long than to use one of these bits right?
 
 A simple bool is sufficient for that. And yes I think we should go with
 per mm_struct flag rather than the additional vma flag if it has only
 the global (whole address space) scope - which would be the case if the
 LOCKONFAULT is always an mlock modifier and the persistance is needed
 only for MCL_FUTURE. Which is imho a sane semantic.

I am in the middle of implementing lock on fault this way, but I cannot
see how we will hanlde mremap of a lock on fault region.  Say we have
the following:

addr = mmap(len, MAP_ANONYMOUS, ...);
mlock(addr, len, MLOCK_ONFAULT);
...
mremap(addr, len, 2 * len, ...)

There is no way for mremap to know that the area being remapped was lock
on fault so it will be locked and prefaulted by remap.  How can we avoid
this without tracking per vma if it was locked with lock or lock on
fault?


signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-21 Thread Michal Hocko
On Thu 20-08-15 13:03:09, Eric B Munson wrote:
 On Thu, 20 Aug 2015, Michal Hocko wrote:
 
  On Wed 19-08-15 17:33:45, Eric B Munson wrote:
  [...]
   The group which asked for this feature here
   wants the ability to distinguish between LOCKED and LOCKONFAULT regions
   and without the VMA flag there isn't a way to do that.
  
  Could you be more specific on why this is needed?
 
 They want to keep metrics on the amount of memory used in a LOCKONFAULT
 region versus the address space of the region.

/proc/pid/smaps already exports that information AFAICS. It exports
VMA flags including VM_LOCKED and if rss  size then this is clearly
LOCKONFAULT because the standard mlock semantic is to populate. Would
that be sufficient?

Now, it is true that LOCKONFAULT wouldn't be distinguishable from
MAP_LOCKED which failed to populate but does that really matter? It is
LOCKONFAULT in a way as well.

   Do we know that these last two open flags are needed right now or is
   this speculation that they will be and that none of the other VMA flags
   can be reclaimed?
  
  I do not think they are needed by anybody right now but that is not a
  reason why it should be used without a really strong justification.
  If the discoverability is really needed then fair enough but I haven't
  seen any justification for that yet.
 
 To be completely clear you believe that if the metrics collection is
 not a strong enough justification, it is better to expand the mm_struct
 by another unsigned long than to use one of these bits right?

A simple bool is sufficient for that. And yes I think we should go with
per mm_struct flag rather than the additional vma flag if it has only
the global (whole address space) scope - which would be the case if the
LOCKONFAULT is always an mlock modifier and the persistance is needed
only for MCL_FUTURE. Which is imho a sane semantic.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Eric B Munson
On Thu, 20 Aug 2015, Michal Hocko wrote:

> On Wed 19-08-15 17:33:45, Eric B Munson wrote:
> [...]
> > The group which asked for this feature here
> > wants the ability to distinguish between LOCKED and LOCKONFAULT regions
> > and without the VMA flag there isn't a way to do that.
> 
> Could you be more specific on why this is needed?

They want to keep metrics on the amount of memory used in a LOCKONFAULT
region versus the address space of the region.

> 
> > Do we know that these last two open flags are needed right now or is
> > this speculation that they will be and that none of the other VMA flags
> > can be reclaimed?
> 
> I do not think they are needed by anybody right now but that is not a
> reason why it should be used without a really strong justification.
> If the discoverability is really needed then fair enough but I haven't
> seen any justification for that yet.

To be completely clear you believe that if the metrics collection is
not a strong enough justification, it is better to expand the mm_struct
by another unsigned long than to use one of these bits right?



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Michal Hocko
On Wed 19-08-15 17:33:45, Eric B Munson wrote:
[...]
> The group which asked for this feature here
> wants the ability to distinguish between LOCKED and LOCKONFAULT regions
> and without the VMA flag there isn't a way to do that.

Could you be more specific on why this is needed?

> Do we know that these last two open flags are needed right now or is
> this speculation that they will be and that none of the other VMA flags
> can be reclaimed?

I do not think they are needed by anybody right now but that is not a
reason why it should be used without a really strong justification.
If the discoverability is really needed then fair enough but I haven't
seen any justification for that yet.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Vlastimil Babka

On 08/19/2015 11:33 PM, Eric B Munson wrote:

On Wed, 12 Aug 2015, Michal Hocko wrote:


On Sun 09-08-15 01:22:53, Eric B Munson wrote:

I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.

This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?


Sorry for the latency here, I was on vacation and am now at plumbers.

I am not sure that growing the mm_struct by another flags field instead
of using available bits in the vm_flags is the right choice.


I was making the same objection on one of the earlier versions and since 
you sticked with a new vm flag, I thought it doesn't matter, as we could 
change it later if we run out of bits. But now I realize that since you 
export this difference to userspace (and below you say that it's by 
request), we won't be able to change it later. So it's a more difficult 
choice.



After this
patch, we still have 3 free bits on 32 bit architectures (2 after the
userfaultfd set IIRC).  The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.

Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?


I think it's the latter, we can expect that flags will be added rather 
than removed, as removal is hard or impossible.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Vlastimil Babka

On 08/19/2015 11:33 PM, Eric B Munson wrote:

On Wed, 12 Aug 2015, Michal Hocko wrote:


On Sun 09-08-15 01:22:53, Eric B Munson wrote:

I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.

This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?


Sorry for the latency here, I was on vacation and am now at plumbers.

I am not sure that growing the mm_struct by another flags field instead
of using available bits in the vm_flags is the right choice.


I was making the same objection on one of the earlier versions and since 
you sticked with a new vm flag, I thought it doesn't matter, as we could 
change it later if we run out of bits. But now I realize that since you 
export this difference to userspace (and below you say that it's by 
request), we won't be able to change it later. So it's a more difficult 
choice.



After this
patch, we still have 3 free bits on 32 bit architectures (2 after the
userfaultfd set IIRC).  The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.

Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?


I think it's the latter, we can expect that flags will be added rather 
than removed, as removal is hard or impossible.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Eric B Munson
On Thu, 20 Aug 2015, Michal Hocko wrote:

 On Wed 19-08-15 17:33:45, Eric B Munson wrote:
 [...]
  The group which asked for this feature here
  wants the ability to distinguish between LOCKED and LOCKONFAULT regions
  and without the VMA flag there isn't a way to do that.
 
 Could you be more specific on why this is needed?

They want to keep metrics on the amount of memory used in a LOCKONFAULT
region versus the address space of the region.

 
  Do we know that these last two open flags are needed right now or is
  this speculation that they will be and that none of the other VMA flags
  can be reclaimed?
 
 I do not think they are needed by anybody right now but that is not a
 reason why it should be used without a really strong justification.
 If the discoverability is really needed then fair enough but I haven't
 seen any justification for that yet.

To be completely clear you believe that if the metrics collection is
not a strong enough justification, it is better to expand the mm_struct
by another unsigned long than to use one of these bits right?



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-20 Thread Michal Hocko
On Wed 19-08-15 17:33:45, Eric B Munson wrote:
[...]
 The group which asked for this feature here
 wants the ability to distinguish between LOCKED and LOCKONFAULT regions
 and without the VMA flag there isn't a way to do that.

Could you be more specific on why this is needed?

 Do we know that these last two open flags are needed right now or is
 this speculation that they will be and that none of the other VMA flags
 can be reclaimed?

I do not think they are needed by anybody right now but that is not a
reason why it should be used without a really strong justification.
If the discoverability is really needed then fair enough but I haven't
seen any justification for that yet.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-19 Thread Eric B Munson
On Wed, 12 Aug 2015, Michal Hocko wrote:

> On Sun 09-08-15 01:22:53, Eric B Munson wrote:
> > The cost of faulting in all memory to be locked can be very high when
> > working with large mappings.  If only portions of the mapping will be
> > used this can incur a high penalty for locking.
> > 
> > For the example of a large file, this is the usage pattern for a large
> > statical language model (probably applies to other statical or graphical
> > models as well).  For the security example, any application transacting
> > in data that cannot be swapped out (credit card data, medical records,
> > etc).
> > 
> > This patch introduces the ability to request that pages are not
> > pre-faulted, but are placed on the unevictable LRU when they are finally
> > faulted in.  The VM_LOCKONFAULT flag will be used together with
> > VM_LOCKED and has no effect when set without VM_LOCKED.
> 
> I do not like this very much to be honest. We have only few bits
> left there and it seems this is not really necessary. I thought that
> LOCKONFAULT acts as a modifier to the mlock call to tell whether to
> poppulate or not. The only place we have to persist it is
> mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
> field in the mm_struct. This could be handled at __mm_populate level.
> So unless I am missing something this would be much more easier
> in the end we no new bit in VM flags would be necessary.
> 
> This would obviously mean that the LOCKONFAULT couldn't be exported to
> the userspace but is this really necessary?

Sorry for the latency here, I was on vacation and am now at plumbers.

I am not sure that growing the mm_struct by another flags field instead
of using available bits in the vm_flags is the right choice.  After this
patch, we still have 3 free bits on 32 bit architectures (2 after the
userfaultfd set IIRC).  The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.

Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-19 Thread Eric B Munson
On Wed, 12 Aug 2015, Michal Hocko wrote:

 On Sun 09-08-15 01:22:53, Eric B Munson wrote:
  The cost of faulting in all memory to be locked can be very high when
  working with large mappings.  If only portions of the mapping will be
  used this can incur a high penalty for locking.
  
  For the example of a large file, this is the usage pattern for a large
  statical language model (probably applies to other statical or graphical
  models as well).  For the security example, any application transacting
  in data that cannot be swapped out (credit card data, medical records,
  etc).
  
  This patch introduces the ability to request that pages are not
  pre-faulted, but are placed on the unevictable LRU when they are finally
  faulted in.  The VM_LOCKONFAULT flag will be used together with
  VM_LOCKED and has no effect when set without VM_LOCKED.
 
 I do not like this very much to be honest. We have only few bits
 left there and it seems this is not really necessary. I thought that
 LOCKONFAULT acts as a modifier to the mlock call to tell whether to
 poppulate or not. The only place we have to persist it is
 mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
 field in the mm_struct. This could be handled at __mm_populate level.
 So unless I am missing something this would be much more easier
 in the end we no new bit in VM flags would be necessary.
 
 This would obviously mean that the LOCKONFAULT couldn't be exported to
 the userspace but is this really necessary?

Sorry for the latency here, I was on vacation and am now at plumbers.

I am not sure that growing the mm_struct by another flags field instead
of using available bits in the vm_flags is the right choice.  After this
patch, we still have 3 free bits on 32 bit architectures (2 after the
userfaultfd set IIRC).  The group which asked for this feature here
wants the ability to distinguish between LOCKED and LOCKONFAULT regions
and without the VMA flag there isn't a way to do that.

Do we know that these last two open flags are needed right now or is
this speculation that they will be and that none of the other VMA flags
can be reclaimed?



signature.asc
Description: Digital signature


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-12 Thread Michal Hocko
On Sun 09-08-15 01:22:53, Eric B Munson wrote:
> The cost of faulting in all memory to be locked can be very high when
> working with large mappings.  If only portions of the mapping will be
> used this can incur a high penalty for locking.
> 
> For the example of a large file, this is the usage pattern for a large
> statical language model (probably applies to other statical or graphical
> models as well).  For the security example, any application transacting
> in data that cannot be swapped out (credit card data, medical records,
> etc).
> 
> This patch introduces the ability to request that pages are not
> pre-faulted, but are placed on the unevictable LRU when they are finally
> faulted in.  The VM_LOCKONFAULT flag will be used together with
> VM_LOCKED and has no effect when set without VM_LOCKED.

I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.

This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

2015-08-12 Thread Michal Hocko
On Sun 09-08-15 01:22:53, Eric B Munson wrote:
 The cost of faulting in all memory to be locked can be very high when
 working with large mappings.  If only portions of the mapping will be
 used this can incur a high penalty for locking.
 
 For the example of a large file, this is the usage pattern for a large
 statical language model (probably applies to other statical or graphical
 models as well).  For the security example, any application transacting
 in data that cannot be swapped out (credit card data, medical records,
 etc).
 
 This patch introduces the ability to request that pages are not
 pre-faulted, but are placed on the unevictable LRU when they are finally
 faulted in.  The VM_LOCKONFAULT flag will be used together with
 VM_LOCKED and has no effect when set without VM_LOCKED.

I do not like this very much to be honest. We have only few bits
left there and it seems this is not really necessary. I thought that
LOCKONFAULT acts as a modifier to the mlock call to tell whether to
poppulate or not. The only place we have to persist it is
mlockall(MCL_FUTURE) AFAICS. And this can be handled by an additional
field in the mm_struct. This could be handled at __mm_populate level.
So unless I am missing something this would be much more easier
in the end we no new bit in VM flags would be necessary.

This would obviously mean that the LOCKONFAULT couldn't be exported to
the userspace but is this really necessary?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/