Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Tue, 23 Jun 2015, Vlastimil Babka wrote: On 06/15/2015 04:43 PM, Eric B Munson wrote: Note that the semantic of MAP_LOCKED can be subtly surprising: mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially. ( from http://marc.info/?l=linux-mmm=143152790412727w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? We could talk about retiring the MAP_LOCKED flag but I suspect that would get significantly more pushback than adding a new mmap flag. Oh no we can't retire as in remove the flag, ever. Just not continue the way of mmap() flags related to mlock(). Likely that the overhead does not matter in most cases, but presumably there are cases where it does (as we have a MAP_LOCKED flag today). Even with the proposed new system calls I think we should have the MAP_LOCKONFAULT for parity with MAP_LOCKED. I'm not convinced, but it's not a major issue. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. They can be, the two that come to mind are medical images and high resolution sensor data. What do others think? signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Thu, Jun 25, 2015 at 7:16 AM, Eric B Munson emun...@akamai.com wrote: On Tue, 23 Jun 2015, Vlastimil Babka wrote: On 06/15/2015 04:43 PM, Eric B Munson wrote: If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. They can be, the two that come to mind are medical images and high resolution sensor data. I think we've been handling sensitive memory pages wrong forever. We shouldn't lock them into memory; we should flag them as sensitive and encrypt them if they're ever written out to disk. --Andy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On 06/15/2015 04:43 PM, Eric B Munson wrote: Note that the semantic of MAP_LOCKED can be subtly surprising: mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially. ( from http://marc.info/?l=linux-mmm=143152790412727w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? We could talk about retiring the MAP_LOCKED flag but I suspect that would get significantly more pushback than adding a new mmap flag. Oh no we can't retire as in remove the flag, ever. Just not continue the way of mmap() flags related to mlock(). Likely that the overhead does not matter in most cases, but presumably there are cases where it does (as we have a MAP_LOCKED flag today). Even with the proposed new system calls I think we should have the MAP_LOCKONFAULT for parity with MAP_LOCKED. I'm not convinced, but it's not a major issue. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? I'm somewhat sceptical about the security one. Are security sensitive buffers that large to matter? The performance one is more convincing and I don't see a better way, so OK. What do others think? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Thu, 11 Jun 2015, Andrew Morton wrote: On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote: Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) How are you testing the vma merging and splitting, btw? Parsing the profcs files? The lock-on-fault test now covers VMA splitting and merging by parsing /proc/self/maps. VMA splitting and merging works as it should with both MAP_LOCKONFAULT and MCL_ONFAULT. What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask how should we design these features. And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. What do others think? I am working on V3 which will introduce the new system calls. signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Fri, 12 Jun 2015, Vlastimil Babka wrote: On 06/11/2015 09:34 PM, Andrew Morton wrote: On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote: Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) How are you testing the vma merging and splitting, btw? Parsing the profcs files? What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask how should we design these features. And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. Note that the semantic of MAP_LOCKED can be subtly surprising: mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially. ( from http://marc.info/?l=linux-mmm=143152790412727w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? We could talk about retiring the MAP_LOCKED flag but I suspect that would get significantly more pushback than adding a new mmap flag. Likely that the overhead does not matter in most cases, but presumably there are cases where it does (as we have a MAP_LOCKED flag today). Even with the proposed new system calls I think we should have the MAP_LOCKONFAULT for parity with MAP_LOCKED. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. Do you disagree with the use cases I have listed or do you think there is a better way of addressing those cases? What do others think? signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On 06/11/2015 09:34 PM, Andrew Morton wrote: On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote: Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) How are you testing the vma merging and splitting, btw? Parsing the profcs files? What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask how should we design these features. And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. Note that the semantic of MAP_LOCKED can be subtly surprising: mlock(2) fails if the memory range cannot get populated to guarantee that no future major faults will happen on the range. mmap(MAP_LOCKED) on the other hand silently succeeds even if the range was populated only partially. ( from http://marc.info/?l=linux-mmm=143152790412727w=2 ) So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's sufficient reason not to extend mmap by new mlock() flags that can be instead applied to the VMA after mmapping, using the proposed mlock2() with flags. So I think instead we could deprecate MAP_LOCKED more prominently. I doubt the overhead of calling the extra syscall matters here? - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. If the new LOCKONFAULT functionality is indeed desired (I haven't still decided myself) then I agree that would be the cleanest way. What do others think? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/10/2015 05:59 PM, Andrew Morton wrote: On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson emun...@akamai.com wrote: mlock() allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is s/mapping/locked area/ Done. allocated. For large mappings where the entire area is not necessary this is not ideal. This series introduces new flags for mmap() and mlockall() that allow a user to specify that the covered are should not be paged out, but only after the memory has been used the first time. The comparison with MCL_FUTURE is hiding over in the 2/3 changelog. It's important so let's copy it here. : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated : in the previous patch becuase MCL_FUTURE will behave as if each mapping : was made with MAP_LOCKED, causing the entire mapping to be faulted in : when new space is allocated or mapped. MCL_ONFAULT allows the user to : delay the fault in cost of any given page until it is actually needed, : but then guarantees that that page will always be resident. Done I *think* it all looks OK. I'd like someone else to go over it also if poss. I guess the 2/3 changelog should have something like : munlockall() will clear MCL_ONFAULT on all vma's in the process's VM. Done It's pretty obvious, but the manpage delta should make this clear also. Done Also the changelog(s) and manpage delta should explain that munlock() clears MCL_ONFAULT. Done And now I'm wondering what happens if userspace does mmap(MAP_LOCKONFAULT) and later does munlock() on just part of that region. Does the vma get split? Is this tested? Should also be in the changelogs and manpage. Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJVed+3AAoJELbVsDOpoOa9eHwP+gO8QmNdUKN55wiTLxXdFTRo TTm62MJ3Yk45+JJ+8xI1POMSUVEBAX7pxnL8TpNPmwp+UF6IQT/hAnnEFNud8/aQ 5bAxU9a5fRO6Q5533woaVpYfXZXwXAla+37MGQziL7O0VEi2aQ9abX7AKnkjmXwq e1Fc3vutAycNCzSxg42GwZxqHw83TYztyv3C4Cc7lShbCezABYvaDvXcUZkGwhjG KJxSPYS2E0nv0MEy995P0L0H1A/KHq6mCOFFKQw6aVbPDs8J/0RhvQIlp/BBCPMV TqDVxMBpTpdWs6reJnUZpouKBTA11KTvUA2HBVn5B14u2V7Np+NBpLKH2DUqAP2v Gyg4Nj0MknqB1rutaBjHjI0ZefrWK5o+zWAVKZs+wtq9WkmCvTYWp505XnlJO+qo 1CEnab2kX8P74UYcsJUrJxAtxc94t6oLh305KnJheQUdcx/ZNKboB2vl1+np10jj oZLmP2RfajZoPojPZ/bI6mj9Ffqf/Ptau+kLQ56G1IuVmQRi4ZgQ9D1+BILXyKHi uycKovcHVffiQ+z1Ama2b4wP1t5yjNdxBH0oV1KMeScCxfyYHPFuDBe36Krjo8FO dDMyibNIRJMX6SeYNIRni40Eafon5h21I95/yWxUaq0FGBZ1NuuSTofxAA53wJJz f0FUI7f53Oxk9EKk8nfg =gfVJ -END PGP SIGNATURE- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote: Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) How are you testing the vma merging and splitting, btw? Parsing the profcs files? What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask how should we design these features. And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. What do others think? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/11/2015 03:34 PM, Andrew Morton wrote: On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote: Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. I have extended the kselftest for lock-on-fault to try both of these scenarios and they work as expected. The VMA is split and the VM flags are set appropriately for the resulting VMAs. munlock() should do vma merging as well. I *think* we implemented that. More tests for you to add ;) I will add a test for this as well. But the code is in place to merge VMAs IIRC. How are you testing the vma merging and splitting, btw? Parsing the profcs files? To show the VMA split happened, I dropped a printk in mlock_fixup() and the user space test simply checks that unlocked pages are not marked as unevictable. The test does not parse maps or smaps for actual VMA layout. Given that we want to check the merging of VMAs as well I will add this. What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. Do you want such a system call as part of this set? I would need some time to make sure I had thought through all the possible corners one could get into with such a call, so it would delay a V3 quite a bit. Otherwise I can send a V3 out immediately. I think the way to look at this is to pretend that mm/mlock.c doesn't exist and ask how should we design these features. And that would be: - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT. - mlock() takes a `flags' argument. Presently that's MLOCK_LOCKED|MLOCK_LOCKONFAULT. - munlock() takes a `flags' arument. MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being cleared. - mlockall() and munlockall() ditto. IOW, LOCKED and LOCKEDONFAULT are treated identically and independently. Now, that's how we would have designed all this on day one. And I think we can do this now, by adding new mlock2() and munlock2() syscalls. And we may as well deprecate the old mlock() and munlock(), not that this matters much. *should* we do this? I'm thinking yes - it's all pretty simple boilerplate and wrappers and such, and it gets the interface correct, and extensible. What do others think? -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJVeefAAAoJELbVsDOpoOa9930P/j32OhsgPdxt8pmlYddpHBJg PJ4EOYZLoNJ0bWAoePRAQvb9Rd0UumXukkQKVdFCFW72QfMPkjqyMWWOA5BZ6dYl q3h3FTzcnAtVHG7bqFheV+Ie9ZX0dplTmuGlqTZzEIVePry9VXzqp9BADbWn3bVR ucq1CFikyEB2yu8pMtykJmEaz4CO7fzCHz6oB7RNX5oHElWmi9AieuUr5eAw6enQ 6ofuNy/N3rTCwcjeRfdL7Xhs6vn62u4nw1Jey6l9hBQUx/ujMktKcn4VwkDXIYCi +h7lfXWruqOuC+lspBRJO7OL2e6nRdedpDWJypeUGcKXokxB2FEB25Yu31K9sk/8 jDfaKNqmcfgOseLHb+DjJqG6nq9lsUhozg8C17SJpT8qFwQ8q7iJe+1GhUF1EBsL +DpqLU56geBY6fyIfurOfp/4Hsx2u1KzezkEnMYT/8LkbGwqbq7Zj4rquLMSHCUt uG5j0MuhmP8/Fuf8OMsIHHUMjBHRjH4rTyaCKxNj3T8uSuLfcnIqEZiJu2qaSA8l PxpQ6yy2szw9lDxPvxLnh8Rkx+SGEc1ciamyppDTI4LQRiCjMQ7bHAKo0RwAaPJL ZSHrdlDnUHrYTnd0EZwg0peh8AgkROgxna/pLpfQTeW1g3erqPfbI0Ab8N0cu5j0 8+qA5C+DeSjaMAoMskTG =82B8 -END PGP SIGNATURE- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
mlock() allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is allocated. For large mappings where the entire area is not necessary this is not ideal. This series introduces new flags for mmap() and mlockall() that allow a user to specify that the covered are should not be paged out, but only after the memory has been used the first time. There are two main use cases that this set covers. The first is the security focussed mlock case. A buffer is needed that cannot be written to swap. The maximum size is known, but on average the memory used is significantly less than this maximum. With lock on fault, the buffer is guaranteed to never be paged out without consuming the maximum size every time such a buffer is created. The second use case is focussed on performance. Portions of a large file are needed and we want to keep the used portions in memory once accessed. This is the case for large graphical models where the path through the graph is not known until run time. The entire graph is unlikely to be used in a given invocation, but once a node has been used it needs to stay resident for further processing. Given these constraints we have a number of options. We can potentially waste a large amount of memory by mlocking the entire region (this can also cause a significant stall at startup as the entire file is read in). We can mlock every page as we access them without tracking if the page is already resident but this introduces large overhead for each access. The third option is mapping the entire region with PROT_NONE and using a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the needed page. Doing this page at a time adds a significant performance penalty. Batching can be used to mitigate this overhead, but in order to safely avoid trying to mprotect pages outside of the mapping, the boundaries of each mapping to be used in this way must be tracked and available to the signal handler. This is precisely what the mm system in the kernel should already be doing. For mmap(MAP_LOCKONFAULT) the user is charged against RLIMIT_MEMLOCK as if MAP_LOCKED was used, so when the VMA is created not when the pages are faulted in. For mlockall(MCL_ON_FAULT) the user is charged as if MCL_FUTURE was used. This decision was made to keep the accounting checks out of the page fault path. To illustrate the benefit of this patch I wrote a test program that mmaps a 5 GB file filled with random data and then makes 15,000,000 accesses to random addresses in that mapping. The test program was run 20 times for each setup. Results are reported for two program portions, setup and execution. The setup phase is calling mmap and optionally mlock on the entire region. For most experiments this is trivial, but it highlights the cost of faulting in the entire region. Results are averages across the 20 runs in milliseconds. mmap with MAP_LOCKED: Setup avg: 11821.193 Processing avg: 3404.286 mmap with mlock() before each access: Setup avg: 0.054 Processing avg: 34263.201 mmap with PROT_NONE and signal handler and batch size of 1 page: With the default value in max_map_count, this gets ENOMEM as I attempt to change the permissions, after upping the sysctl significantly I get: Setup avg: 0.050 Processing avg: 67690.625 mmap with PROT_NONE and signal handler and batch size of 8 pages: Setup avg: 0.098 Processing avg: 37344.197 mmap with PROT_NONE and signal handler and batch size of 16 pages: Setup avg: 0.0548 Processing avg: 29295.669 mmap with MAP_LOCKONFAULT: Setup avg: 0.073 Processing avg: 18392.136 The signal handler in the batch cases faulted in memory in two steps to avoid having to know the start and end of the faulting mapping. The first step covers the page that caused the fault as we know that it will be possible to lock. The second step speculatively tries to mlock and mprotect the batch size - 1 pages that follow. There may be a clever way to avoid this without having the program track each mapping to be covered by this handeler in a globally accessible structure, but I could not find it. It should be noted that with a large enough batch size this two step fault handler can still cause the program to crash if it reaches far beyond the end of the mapping. These results show that if the developer knows that a majority of the mapping will be used, it is better to try and fault it in at once, otherwise MAP_LOCKONFAULT is significantly faster. The performance cost of these patches are minimal on the two benchmarks I have tested (stream and kernbench). The following are the average values across 20 runs of each benchmark after a warmup run whose results were discarded. Avg throughput in MB/s from stream using 100 element arrays Test 4.1-rc2 4.1-rc2+lock-on-fault Copy:10,979.0810,917.34 Scale: 11,094.4511,023.01 Add: 12,487.2912,388.65 Triad:
Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault
On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson emun...@akamai.com wrote: mlock() allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is s/mapping/locked area/ allocated. For large mappings where the entire area is not necessary this is not ideal. This series introduces new flags for mmap() and mlockall() that allow a user to specify that the covered are should not be paged out, but only after the memory has been used the first time. The comparison with MCL_FUTURE is hiding over in the 2/3 changelog. It's important so let's copy it here. : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated : in the previous patch becuase MCL_FUTURE will behave as if each mapping : was made with MAP_LOCKED, causing the entire mapping to be faulted in : when new space is allocated or mapped. MCL_ONFAULT allows the user to : delay the fault in cost of any given page until it is actually needed, : but then guarantees that that page will always be resident. I *think* it all looks OK. I'd like someone else to go over it also if poss. I guess the 2/3 changelog should have something like : munlockall() will clear MCL_ONFAULT on all vma's in the process's VM. It's pretty obvious, but the manpage delta should make this clear also. Also the changelog(s) and manpage delta should explain that munlock() clears MCL_ONFAULT. And now I'm wondering what happens if userspace does mmap(MAP_LOCKONFAULT) and later does munlock() on just part of that region. Does the vma get split? Is this tested? Should also be in the changelogs and manpage. Ditto mlockall(MCL_ONFAULT) followed by munlock(). I'm not sure that even makes sense but the behaviour should be understood and tested. What's missing here is a syscall to set VM_LOCKONFAULT on an arbitrary range of memory - mlock() for lock-on-fault. It's a shame that mlock() didn't take a `mode' argument. Perhaps we should add such a syscall - that would make the mmap flag unneeded but I suppose it should be kept for symmetry. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev