Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-25 Thread Eric B Munson
On Tue, 23 Jun 2015, Vlastimil Babka wrote:

 On 06/15/2015 04:43 PM, Eric B Munson wrote:
 Note that the semantic of MAP_LOCKED can be subtly surprising:
 
 mlock(2) fails if the memory range cannot get populated to guarantee
 that no future major faults will happen on the range.
 mmap(MAP_LOCKED) on the other hand silently succeeds even if the
 range was populated only
 partially.
 
 ( from http://marc.info/?l=linux-mmm=143152790412727w=2 )
 
 So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
 MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
 sufficient reason not to extend mmap by new mlock() flags that can
 be instead applied to the VMA after mmapping, using the proposed
 mlock2() with flags. So I think instead we could deprecate
 MAP_LOCKED more prominently. I doubt the overhead of calling the
 extra syscall matters here?
 
 We could talk about retiring the MAP_LOCKED flag but I suspect that
 would get significantly more pushback than adding a new mmap flag.
 
 Oh no we can't retire as in remove the flag, ever. Just not
 continue the way of mmap() flags related to mlock().
 
 Likely that the overhead does not matter in most cases, but presumably
 there are cases where it does (as we have a MAP_LOCKED flag today).
 Even with the proposed new system calls I think we should have the
 MAP_LOCKONFAULT for parity with MAP_LOCKED.
 
 I'm not convinced, but it's not a major issue.
 
 
 - mlock() takes a `flags' argument.  Presently that's
MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.
 
 Do you disagree with the use cases I have listed or do you think there
 is a better way of addressing those cases?
 
 I'm somewhat sceptical about the security one. Are security
 sensitive buffers that large to matter? The performance one is more
 convincing and I don't see a better way, so OK.

They can be, the two that come to mind are medical images and high
resolution sensor data.

 
 
 
 What do others think?
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-25 Thread Andy Lutomirski
On Thu, Jun 25, 2015 at 7:16 AM, Eric B Munson emun...@akamai.com wrote:
 On Tue, 23 Jun 2015, Vlastimil Babka wrote:

 On 06/15/2015 04:43 PM, Eric B Munson wrote:
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.
 
 Do you disagree with the use cases I have listed or do you think there
 is a better way of addressing those cases?

 I'm somewhat sceptical about the security one. Are security
 sensitive buffers that large to matter? The performance one is more
 convincing and I don't see a better way, so OK.

 They can be, the two that come to mind are medical images and high
 resolution sensor data.

I think we've been handling sensitive memory pages wrong forever.  We
shouldn't lock them into memory; we should flag them as sensitive and
encrypt them if they're ever written out to disk.

--Andy
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-23 Thread Vlastimil Babka

On 06/15/2015 04:43 PM, Eric B Munson wrote:

Note that the semantic of MAP_LOCKED can be subtly surprising:

mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range.
mmap(MAP_LOCKED) on the other hand silently succeeds even if the
range was populated only
partially.

( from http://marc.info/?l=linux-mmm=143152790412727w=2 )

So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
sufficient reason not to extend mmap by new mlock() flags that can
be instead applied to the VMA after mmapping, using the proposed
mlock2() with flags. So I think instead we could deprecate
MAP_LOCKED more prominently. I doubt the overhead of calling the
extra syscall matters here?


We could talk about retiring the MAP_LOCKED flag but I suspect that
would get significantly more pushback than adding a new mmap flag.


Oh no we can't retire as in remove the flag, ever. Just not continue 
the way of mmap() flags related to mlock().



Likely that the overhead does not matter in most cases, but presumably
there are cases where it does (as we have a MAP_LOCKED flag today).
Even with the proposed new system calls I think we should have the
MAP_LOCKONFAULT for parity with MAP_LOCKED.


I'm not convinced, but it's not a major issue.




- mlock() takes a `flags' argument.  Presently that's
   MLOCK_LOCKED|MLOCK_LOCKONFAULT.

- munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
   to specify which flags are being cleared.

- mlockall() and munlockall() ditto.


IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.

Now, that's how we would have designed all this on day one.  And I
think we can do this now, by adding new mlock2() and munlock2()
syscalls.  And we may as well deprecate the old mlock() and munlock(),
not that this matters much.

*should* we do this?  I'm thinking yes - it's all pretty simple
boilerplate and wrappers and such, and it gets the interface correct,
and extensible.


If the new LOCKONFAULT functionality is indeed desired (I haven't
still decided myself) then I agree that would be the cleanest way.


Do you disagree with the use cases I have listed or do you think there
is a better way of addressing those cases?


I'm somewhat sceptical about the security one. Are security sensitive 
buffers that large to matter? The performance one is more convincing and 
I don't see a better way, so OK.







What do others think?


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-15 Thread Eric B Munson
On Thu, 11 Jun 2015, Andrew Morton wrote:

 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:
 
   Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
   that even makes sense but the behaviour should be understood and
   tested.
 
  I have extended the kselftest for lock-on-fault to try both of these
  scenarios and they work as expected.  The VMA is split and the VM
  flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented
 that.  More tests for you to add ;)
 
 How are you testing the vma merging and splitting, btw?  Parsing
 the profcs files?

The lock-on-fault test now covers VMA splitting and merging by parsing
/proc/self/maps.  VMA splitting and merging works as it should with both
MAP_LOCKONFAULT and MCL_ONFAULT.

 
   What's missing here is a syscall to set VM_LOCKONFAULT on an
   arbitrary range of memory - mlock() for lock-on-fault.  It's a
   shame that mlock() didn't take a `mode' argument.  Perhaps we
   should add such a syscall - that would make the mmap flag unneeded
   but I suppose it should be kept for symmetry.
  
  Do you want such a system call as part of this set?  I would need some
  time to make sure I had thought through all the possible corners one
  could get into with such a call, so it would delay a V3 quite a bit.
  Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c doesn't
 exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 - mlock() takes a `flags' argument.  Presently that's
   MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
   to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 What do others think?

I am working on V3 which will introduce the new system calls.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-15 Thread Eric B Munson
On Fri, 12 Jun 2015, Vlastimil Babka wrote:

 On 06/11/2015 09:34 PM, Andrew Morton wrote:
 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
 that even makes sense but the behaviour should be understood and
 tested.
 
 I have extended the kselftest for lock-on-fault to try both of these
 scenarios and they work as expected.  The VMA is split and the VM
 flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented
 that.  More tests for you to add ;)
 
 How are you testing the vma merging and splitting, btw?  Parsing
 the profcs files?
 
 What's missing here is a syscall to set VM_LOCKONFAULT on an
 arbitrary range of memory - mlock() for lock-on-fault.  It's a
 shame that mlock() didn't take a `mode' argument.  Perhaps we
 should add such a syscall - that would make the mmap flag unneeded
 but I suppose it should be kept for symmetry.
 
 Do you want such a system call as part of this set?  I would need some
 time to make sure I had thought through all the possible corners one
 could get into with such a call, so it would delay a V3 quite a bit.
 Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c doesn't
 exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 Note that the semantic of MAP_LOCKED can be subtly surprising:
 
 mlock(2) fails if the memory range cannot get populated to guarantee
 that no future major faults will happen on the range.
 mmap(MAP_LOCKED) on the other hand silently succeeds even if the
 range was populated only
 partially.
 
 ( from http://marc.info/?l=linux-mmm=143152790412727w=2 )
 
 So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
 MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
 sufficient reason not to extend mmap by new mlock() flags that can
 be instead applied to the VMA after mmapping, using the proposed
 mlock2() with flags. So I think instead we could deprecate
 MAP_LOCKED more prominently. I doubt the overhead of calling the
 extra syscall matters here?

We could talk about retiring the MAP_LOCKED flag but I suspect that
would get significantly more pushback than adding a new mmap flag.

Likely that the overhead does not matter in most cases, but presumably
there are cases where it does (as we have a MAP_LOCKED flag today).
Even with the proposed new system calls I think we should have the
MAP_LOCKONFAULT for parity with MAP_LOCKED.

 
 - mlock() takes a `flags' argument.  Presently that's
MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.

Do you disagree with the use cases I have listed or do you think there
is a better way of addressing those cases?

 
 What do others think?


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-12 Thread Vlastimil Babka

On 06/11/2015 09:34 PM, Andrew Morton wrote:

On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:


Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
that even makes sense but the behaviour should be understood and
tested.


I have extended the kselftest for lock-on-fault to try both of these
scenarios and they work as expected.  The VMA is split and the VM
flags are set appropriately for the resulting VMAs.


munlock() should do vma merging as well.  I *think* we implemented
that.  More tests for you to add ;)

How are you testing the vma merging and splitting, btw?  Parsing
the profcs files?


What's missing here is a syscall to set VM_LOCKONFAULT on an
arbitrary range of memory - mlock() for lock-on-fault.  It's a
shame that mlock() didn't take a `mode' argument.  Perhaps we
should add such a syscall - that would make the mmap flag unneeded
but I suppose it should be kept for symmetry.


Do you want such a system call as part of this set?  I would need some
time to make sure I had thought through all the possible corners one
could get into with such a call, so it would delay a V3 quite a bit.
Otherwise I can send a V3 out immediately.


I think the way to look at this is to pretend that mm/mlock.c doesn't
exist and ask how should we design these features.

And that would be:

- mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.


Note that the semantic of MAP_LOCKED can be subtly surprising:

mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range. mmap(MAP_LOCKED) 
on the other hand silently succeeds even if the range was populated only

partially.

( from http://marc.info/?l=linux-mmm=143152790412727w=2 )

So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While 
MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's 
sufficient reason not to extend mmap by new mlock() flags that can be 
instead applied to the VMA after mmapping, using the proposed mlock2() 
with flags. So I think instead we could deprecate MAP_LOCKED more 
prominently. I doubt the overhead of calling the extra syscall matters here?



- mlock() takes a `flags' argument.  Presently that's
   MLOCK_LOCKED|MLOCK_LOCKONFAULT.

- munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
   to specify which flags are being cleared.

- mlockall() and munlockall() ditto.


IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.

Now, that's how we would have designed all this on day one.  And I
think we can do this now, by adding new mlock2() and munlock2()
syscalls.  And we may as well deprecate the old mlock() and munlock(),
not that this matters much.

*should* we do this?  I'm thinking yes - it's all pretty simple
boilerplate and wrappers and such, and it gets the interface correct,
and extensible.


If the new LOCKONFAULT functionality is indeed desired (I haven't still 
decided myself) then I agree that would be the cleanest way.



What do others think?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-11 Thread Eric B Munson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/10/2015 05:59 PM, Andrew Morton wrote:
 On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson
 emun...@akamai.com wrote:
 
 mlock() allows a user to control page out of program memory, but
 this comes at the cost of faulting in the entire mapping when it
 is
 
 s/mapping/locked area/

Done.

 
 allocated.  For large mappings where the entire area is not
 necessary this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that
 allow a user to specify that the covered are should not be paged
 out, but only after the memory has been used the first time.
 
 The comparison with MCL_FUTURE is hiding over in the 2/3 changelog.
  It's important so let's copy it here.
 
 : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases
 enumerated : in the previous patch becuase MCL_FUTURE will behave
 as if each mapping : was made with MAP_LOCKED, causing the entire
 mapping to be faulted in : when new space is allocated or mapped.
 MCL_ONFAULT allows the user to : delay the fault in cost of any
 given page until it is actually needed, : but then guarantees that
 that page will always be resident.

Done

 
 I *think* it all looks OK.  I'd like someone else to go over it
 also if poss.
 
 
 I guess the 2/3 changelog should have something like
 
 : munlockall() will clear MCL_ONFAULT on all vma's in the process's
 VM.

Done

 
 It's pretty obvious, but the manpage delta should make this clear
 also.

Done

 
 
 Also the changelog(s) and manpage delta should explain that
 munlock() clears MCL_ONFAULT.

Done

 
 And now I'm wondering what happens if userspace does 
 mmap(MAP_LOCKONFAULT) and later does munlock() on just part of
 that region.  Does the vma get split?  Is this tested?  Should also
 be in the changelogs and manpage.
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
 that even makes sense but the behaviour should be understood and
 tested.

I have extended the kselftest for lock-on-fault to try both of these
scenarios and they work as expected.  The VMA is split and the VM
flags are set appropriately for the resulting VMAs.

 
 
 What's missing here is a syscall to set VM_LOCKONFAULT on an
 arbitrary range of memory - mlock() for lock-on-fault.  It's a
 shame that mlock() didn't take a `mode' argument.  Perhaps we
 should add such a syscall - that would make the mmap flag unneeded
 but I suppose it should be kept for symmetry.

Do you want such a system call as part of this set?  I would need some
time to make sure I had thought through all the possible corners one
could get into with such a call, so it would delay a V3 quite a bit.
Otherwise I can send a V3 out immediately.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVed+3AAoJELbVsDOpoOa9eHwP+gO8QmNdUKN55wiTLxXdFTRo
TTm62MJ3Yk45+JJ+8xI1POMSUVEBAX7pxnL8TpNPmwp+UF6IQT/hAnnEFNud8/aQ
5bAxU9a5fRO6Q5533woaVpYfXZXwXAla+37MGQziL7O0VEi2aQ9abX7AKnkjmXwq
e1Fc3vutAycNCzSxg42GwZxqHw83TYztyv3C4Cc7lShbCezABYvaDvXcUZkGwhjG
KJxSPYS2E0nv0MEy995P0L0H1A/KHq6mCOFFKQw6aVbPDs8J/0RhvQIlp/BBCPMV
TqDVxMBpTpdWs6reJnUZpouKBTA11KTvUA2HBVn5B14u2V7Np+NBpLKH2DUqAP2v
Gyg4Nj0MknqB1rutaBjHjI0ZefrWK5o+zWAVKZs+wtq9WkmCvTYWp505XnlJO+qo
1CEnab2kX8P74UYcsJUrJxAtxc94t6oLh305KnJheQUdcx/ZNKboB2vl1+np10jj
oZLmP2RfajZoPojPZ/bI6mj9Ffqf/Ptau+kLQ56G1IuVmQRi4ZgQ9D1+BILXyKHi
uycKovcHVffiQ+z1Ama2b4wP1t5yjNdxBH0oV1KMeScCxfyYHPFuDBe36Krjo8FO
dDMyibNIRJMX6SeYNIRni40Eafon5h21I95/yWxUaq0FGBZ1NuuSTofxAA53wJJz
f0FUI7f53Oxk9EKk8nfg
=gfVJ
-END PGP SIGNATURE-
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-11 Thread Andrew Morton
On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:

  Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
  that even makes sense but the behaviour should be understood and
  tested.

 I have extended the kselftest for lock-on-fault to try both of these
 scenarios and they work as expected.  The VMA is split and the VM
 flags are set appropriately for the resulting VMAs.

munlock() should do vma merging as well.  I *think* we implemented
that.  More tests for you to add ;)

How are you testing the vma merging and splitting, btw?  Parsing
the profcs files?

  What's missing here is a syscall to set VM_LOCKONFAULT on an
  arbitrary range of memory - mlock() for lock-on-fault.  It's a
  shame that mlock() didn't take a `mode' argument.  Perhaps we
  should add such a syscall - that would make the mmap flag unneeded
  but I suppose it should be kept for symmetry.
 
 Do you want such a system call as part of this set?  I would need some
 time to make sure I had thought through all the possible corners one
 could get into with such a call, so it would delay a V3 quite a bit.
 Otherwise I can send a V3 out immediately.

I think the way to look at this is to pretend that mm/mlock.c doesn't
exist and ask how should we design these features.

And that would be:

- mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.

- mlock() takes a `flags' argument.  Presently that's
  MLOCK_LOCKED|MLOCK_LOCKONFAULT.

- munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
  to specify which flags are being cleared.

- mlockall() and munlockall() ditto.


IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.

Now, that's how we would have designed all this on day one.  And I
think we can do this now, by adding new mlock2() and munlock2()
syscalls.  And we may as well deprecate the old mlock() and munlock(),
not that this matters much.

*should* we do this?  I'm thinking yes - it's all pretty simple
boilerplate and wrappers and such, and it gets the interface correct,
and extensible.

What do others think?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-11 Thread Eric B Munson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/11/2015 03:34 PM, Andrew Morton wrote:
 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson
 emun...@akamai.com wrote:
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not
 sure that even makes sense but the behaviour should be
 understood and tested.
 
 I have extended the kselftest for lock-on-fault to try both of
 these scenarios and they work as expected.  The VMA is split and
 the VM flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented 
 that.  More tests for you to add ;)

I will add a test for this as well.  But the code is in place to merge
VMAs IIRC.

 
 How are you testing the vma merging and splitting, btw?  Parsing 
 the profcs files?

To show the VMA split happened, I dropped a printk in mlock_fixup()
and the user space test simply checks that unlocked pages are not
marked as unevictable.  The test does not parse maps or smaps for
actual VMA layout.  Given that we want to check the merging of VMAs as
well I will add this.

 
 What's missing here is a syscall to set VM_LOCKONFAULT on an 
 arbitrary range of memory - mlock() for lock-on-fault.  It's a 
 shame that mlock() didn't take a `mode' argument.  Perhaps we 
 should add such a syscall - that would make the mmap flag
 unneeded but I suppose it should be kept for symmetry.
 
 Do you want such a system call as part of this set?  I would need
 some time to make sure I had thought through all the possible
 corners one could get into with such a call, so it would delay a
 V3 quite a bit. Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c
 doesn't exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 - mlock() takes a `flags' argument.  Presently that's 
 MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.
 MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being
 cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and
 independently.
 
 Now, that's how we would have designed all this on day one.  And I 
 think we can do this now, by adding new mlock2() and munlock2() 
 syscalls.  And we may as well deprecate the old mlock() and
 munlock(), not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple 
 boilerplate and wrappers and such, and it gets the interface
 correct, and extensible.
 
 What do others think?
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVeefAAAoJELbVsDOpoOa9930P/j32OhsgPdxt8pmlYddpHBJg
PJ4EOYZLoNJ0bWAoePRAQvb9Rd0UumXukkQKVdFCFW72QfMPkjqyMWWOA5BZ6dYl
q3h3FTzcnAtVHG7bqFheV+Ie9ZX0dplTmuGlqTZzEIVePry9VXzqp9BADbWn3bVR
ucq1CFikyEB2yu8pMtykJmEaz4CO7fzCHz6oB7RNX5oHElWmi9AieuUr5eAw6enQ
6ofuNy/N3rTCwcjeRfdL7Xhs6vn62u4nw1Jey6l9hBQUx/ujMktKcn4VwkDXIYCi
+h7lfXWruqOuC+lspBRJO7OL2e6nRdedpDWJypeUGcKXokxB2FEB25Yu31K9sk/8
jDfaKNqmcfgOseLHb+DjJqG6nq9lsUhozg8C17SJpT8qFwQ8q7iJe+1GhUF1EBsL
+DpqLU56geBY6fyIfurOfp/4Hsx2u1KzezkEnMYT/8LkbGwqbq7Zj4rquLMSHCUt
uG5j0MuhmP8/Fuf8OMsIHHUMjBHRjH4rTyaCKxNj3T8uSuLfcnIqEZiJu2qaSA8l
PxpQ6yy2szw9lDxPvxLnh8Rkx+SGEc1ciamyppDTI4LQRiCjMQ7bHAKo0RwAaPJL
ZSHrdlDnUHrYTnd0EZwg0peh8AgkROgxna/pLpfQTeW1g3erqPfbI0Ab8N0cu5j0
8+qA5C+DeSjaMAoMskTG
=82B8
-END PGP SIGNATURE-
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-10 Thread Eric B Munson
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

There are two main use cases that this set covers.  The first is the
security focussed mlock case.  A buffer is needed that cannot be written
to swap.  The maximum size is known, but on average the memory used is
significantly less than this maximum.  With lock on fault, the buffer
is guaranteed to never be paged out without consuming the maximum size
every time such a buffer is created.

The second use case is focussed on performance.  Portions of a large
file are needed and we want to keep the used portions in memory once
accessed.  This is the case for large graphical models where the path
through the graph is not known until run time.  The entire graph is
unlikely to be used in a given invocation, but once a node has been
used it needs to stay resident for further processing.  Given these
constraints we have a number of options.  We can potentially waste a
large amount of memory by mlocking the entire region (this can also
cause a significant stall at startup as the entire file is read in).
We can mlock every page as we access them without tracking if the page
is already resident but this introduces large overhead for each access.
The third option is mapping the entire region with PROT_NONE and using
a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the
needed page.  Doing this page at a time adds a significant performance
penalty.  Batching can be used to mitigate this overhead, but in order
to safely avoid trying to mprotect pages outside of the mapping, the
boundaries of each mapping to be used in this way must be tracked and
available to the signal handler.  This is precisely what the mm system
in the kernel should already be doing.

For mmap(MAP_LOCKONFAULT) the user is charged against RLIMIT_MEMLOCK
as if MAP_LOCKED was used, so when the VMA is created not when the pages
are faulted in.  For mlockall(MCL_ON_FAULT) the user is charged as if
MCL_FUTURE was used.  This decision was made to keep the accounting
checks out of the page fault path.

To illustrate the benefit of this patch I wrote a test program that
mmaps a 5 GB file filled with random data and then makes 15,000,000
accesses to random addresses in that mapping.  The test program was run
20 times for each setup.  Results are reported for two program portions,
setup and execution.  The setup phase is calling mmap and optionally
mlock on the entire region.  For most experiments this is trivial, but
it highlights the cost of faulting in the entire region.  Results are
averages across the 20 runs in milliseconds.

mmap with MAP_LOCKED:
Setup avg:  11821.193
Processing avg: 3404.286

mmap with mlock() before each access:
Setup avg:  0.054
Processing avg: 34263.201

mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg:  0.050
Processing avg: 67690.625

mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg:  0.098
Processing avg: 37344.197

mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg:  0.0548
Processing avg: 29295.669

mmap with MAP_LOCKONFAULT:
Setup avg:  0.073
Processing avg: 18392.136

The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping.  The
first step covers the page that caused the fault as we know that it will
be possible to lock.  The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow.  There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it.  It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of each benchmark after a warmup run whose
results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.1-rc2  4.1-rc2+lock-on-fault
Copy:10,979.0810,917.34
Scale:   11,094.4511,023.01
Add: 12,487.2912,388.65
Triad:   

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-10 Thread Andrew Morton
On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson emun...@akamai.com wrote:

 mlock() allows a user to control page out of program memory, but this
 comes at the cost of faulting in the entire mapping when it is

s/mapping/locked area/

 allocated.  For large mappings where the entire area is not necessary
 this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that allow a
 user to specify that the covered are should not be paged out, but only
 after the memory has been used the first time.

The comparison with MCL_FUTURE is hiding over in the 2/3 changelog. 
It's important so let's copy it here.

: MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated
: in the previous patch becuase MCL_FUTURE will behave as if each mapping
: was made with MAP_LOCKED, causing the entire mapping to be faulted in
: when new space is allocated or mapped.  MCL_ONFAULT allows the user to
: delay the fault in cost of any given page until it is actually needed,
: but then guarantees that that page will always be resident.

I *think* it all looks OK.  I'd like someone else to go over it also if
poss.


I guess the 2/3 changelog should have something like

: munlockall() will clear MCL_ONFAULT on all vma's in the process's VM.

It's pretty obvious, but the manpage delta should make this clear also.


Also the changelog(s) and manpage delta should explain that munlock()
clears MCL_ONFAULT.

And now I'm wondering what happens if userspace does
mmap(MAP_LOCKONFAULT) and later does munlock() on just part of that
region.  Does the vma get split?  Is this tested?  Should also be in
the changelogs and manpage.

Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure that
even makes sense but the behaviour should be understood and tested.


What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary
range of memory - mlock() for lock-on-fault.  It's a shame that mlock()
didn't take a `mode' argument.  Perhaps we should add such a syscall -
that would make the mmap flag unneeded but I suppose it should be kept
for symmetry.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev