Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-25 Thread Eric B Munson
On Wed, 24 Jun 2015, Michal Hocko wrote:

 On Mon 22-06-15 10:18:06, Eric B Munson wrote:
  On Mon, 22 Jun 2015, Michal Hocko wrote:
  
   On Fri 19-06-15 12:43:33, Eric B Munson wrote:
 [...]
Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)? 
   
   I thought the MAP_FAULTPOPULATE (or any other better name) would
   directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
   locked semantic. We already have VM_LOCKED for that. The direct effect
   of the flag would be to prevent from population other than the direct
   page fault - including any speculative actions like fault around or
   read-ahead.
  
  I like the ability to control other speculative population, but I am not
  sure about overloading it with the VM_LOCKONFAULT case.  Here is my
  concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
  LOCKONFAULT, how can we tell the difference between someone that wants
  to avoid read-ahead and wants to use mlock()?
 
 Not sure I understand. Something like?
 addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma
 [...]
 mlock(addr, len) # Now I want the full mlock semantic

So this leaves us without the LOCKONFAULT semantics?  That is not at all
what I am looking for.  What I want is a way to express 3 possible
states of a VMA WRT locking, locked (populated and all pages on the
unevictable LRU), lock on fault (populated by page fault, pages that are
present are on the unevictable LRU, newly faulted pages are added to
same), and not locked.

 
 and the later to have the full mlock semantic and populate the given
 area regardless of VM_FAULTPOPULATE being set on the vma? This would
 be an interesting question because mlock man page clearly states the
 semantic and that is to _always_ populate or fail. So I originally
 thought that it would obey VM_FAULTPOPULATE but this needs a more
 thinking.
 
  This might lead to some
  interesting states with mlock() and munlock() that take flags.  For
  instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
  munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
  VM_LOCKONFAULT set. 
 
 This is really confusing. Let me try to rephrase that. So you have
 mlock(addr, len, MLOCK_ONFAULT)
 munlock(addr, len, MLOCK_LOCKED)
 
 IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't
 that behavior strange and unexpected? First of all, munlock has
 traditionally dropped the lock on the address range (e.g. what should
 happen if you did plain old munlock(addr, len)). But even without
 that. You are trying to unlock something that hasn't been locked the
 same way. So I would expect -EINVAL at least, if the two modes should be
 really represented by different flags.

I would expect it to remain MLOCK_LOCKONFAULT because the user requested
munlock(addr, len, MLOCK_LOCKED).  It is not currently an error to
unlock memory that is not locked.  We do this because we do not require
the user track what areas are locked.  It is acceptable to have a mostly
locked area with holes unlocked with a single call to munlock that spans
the entire area.  The same semantics should hold for munlock with flags.
If I have an area with MLOCK_LOCKED and MLOCK_ONFAULT interleaved, it
should be acceptable to clear the MLOCK_ONFAULT flag from those areas
with a single munlock call that spans the area.

On top of continuing with munlock semantics, the implementation would
need the ability to rollback an munlock call if it failed after altering
VMAs.  If we have the same interleaved area as before and we go to
return -EINVAL the first time we hit an area that was MLOCK_LOCKED, how
do we restore the state of the VMAs we have already processed, and
possibly merged/split?
 
 Or did you mean the both types of lock like:
 mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT)
 mlock(addr, len, MLOCK_LOCKED)
 munlock(addr, len, MLOCK_LOCKED)
 
 and that should keep MLOCK_ONFAULT?
 This sounds even more weird to me because that means that the vma in
 question would be locked by two different mechanisms. MLOCK_LOCKED with
 the always populate semantic would rule out MLOCK_ONFAULT so what
 would be the meaning of the other flag then? Also what should regular
 munlock(addr, len) without flags unlock? Both?

This is indeed confusing and not what I was trying to illustrate, but
since you bring it up.  mlockall() currently clears all flags and then
sets the new flags with each subsequent call.  mlock2 would use that
same behavior, if LOCKED was specified for a ONFAULT region, that region
would become LOCKED and vice versa.

I have the new system call set ready, I am waiting to post for rc1 so I
can run the benchmarks again on a base more stable than the middle of a
merge window.  We should wait to hash out implementations until the code
is up rather than talk past eachother here.

 
  If we use VM_FAULTPOPULATE, the same pair of calls
  would clear VM_LOCKED, but leave 

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-24 Thread Michal Hocko
On Mon 22-06-15 10:18:06, Eric B Munson wrote:
 On Mon, 22 Jun 2015, Michal Hocko wrote:
 
  On Fri 19-06-15 12:43:33, Eric B Munson wrote:
[...]
   Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
   new MAP_LOCKONFAULT flag (or both)? 
  
  I thought the MAP_FAULTPOPULATE (or any other better name) would
  directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
  locked semantic. We already have VM_LOCKED for that. The direct effect
  of the flag would be to prevent from population other than the direct
  page fault - including any speculative actions like fault around or
  read-ahead.
 
 I like the ability to control other speculative population, but I am not
 sure about overloading it with the VM_LOCKONFAULT case.  Here is my
 concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
 LOCKONFAULT, how can we tell the difference between someone that wants
 to avoid read-ahead and wants to use mlock()?

Not sure I understand. Something like?
addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma
[...]
mlock(addr, len) # Now I want the full mlock semantic

and the later to have the full mlock semantic and populate the given
area regardless of VM_FAULTPOPULATE being set on the vma? This would
be an interesting question because mlock man page clearly states the
semantic and that is to _always_ populate or fail. So I originally
thought that it would obey VM_FAULTPOPULATE but this needs a more
thinking.

 This might lead to some
 interesting states with mlock() and munlock() that take flags.  For
 instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
 munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
 VM_LOCKONFAULT set. 

This is really confusing. Let me try to rephrase that. So you have
mlock(addr, len, MLOCK_ONFAULT)
munlock(addr, len, MLOCK_LOCKED)

IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't
that behavior strange and unexpected? First of all, munlock has
traditionally dropped the lock on the address range (e.g. what should
happen if you did plain old munlock(addr, len)). But even without
that. You are trying to unlock something that hasn't been locked the
same way. So I would expect -EINVAL at least, if the two modes should be
really represented by different flags.

Or did you mean the both types of lock like:
mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT)
mlock(addr, len, MLOCK_LOCKED)
munlock(addr, len, MLOCK_LOCKED)

and that should keep MLOCK_ONFAULT?
This sounds even more weird to me because that means that the vma in
question would be locked by two different mechanisms. MLOCK_LOCKED with
the always populate semantic would rule out MLOCK_ONFAULT so what
would be the meaning of the other flag then? Also what should regular
munlock(addr, len) without flags unlock? Both?

 If we use VM_FAULTPOPULATE, the same pair of calls
 would clear VM_LOCKED, but leave VM_FAULTPOPULATE.  It may not matter in
 the end, but I am concerned about the subtleties here.

This sounds like the proper behavior to me. munlock should simply always
drop VM_LOCKED and the VM_FAULTPOPULATE can live its separate life.

Btw. could you be more specific about semantic of m{un}lock(addr, len, flags)
you want to propose? The more I think about that the more I am unclear
about it, especially munlock behavior and possible flags.
-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-24 Thread Michal Hocko
On Tue 23-06-15 14:45:17, Vlastimil Babka wrote:
 On 06/22/2015 04:18 PM, Eric B Munson wrote:
 On Mon, 22 Jun 2015, Michal Hocko wrote:
 
 On Fri 19-06-15 12:43:33, Eric B Munson wrote:
[...]
 My thought on detecting was that someone might want to know if they had
 a VMA that was VM_LOCKED but had not been made present becuase of a
 failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
 is at least explicit about what is happening which would make detecting
 the VM_LOCKED but not present state easier.
 
 One could use /proc/pid/pagemap to query the residency.
 
 I think that's all too much complex scenario for a little gain. If someone
 knows that mmap(MAP_LOCKED|MAP_POPULATE) is not perfect, he should either
 mlock() separately from mmap(), or fault the range manually with a for loop.
 Why try to detect if the corner case was hit?

No idea. I have just offered a way to do that. I do not think it is
anyhow useful but who knows... I do agree that the mlock should be used
for the full mlock semantic.

 This assumes that
 MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
 it would have to.
 
 Yes, it would have to have a VM flag for the vma.
 
 So with your approach, VM_LOCKED flag is enough, right? The new MAP_ /
 MLOCK_ flags just cause setting VM_LOCKED to not fault the whole vma, but
 otherwise nothing changes.

VM_FAULTPOPULATE would have to be sticky to prevent from other
speculative poppulation of the mapping. I mean, is it OK to have a new
mlock semantic (on fault) which might still populatelock memory which
hasn't been faulted directly? Who knows what kind of speculative things
we will do in the future and then find out that the semantic of
lock-on-fault is not usable anymore.

[...]

-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-23 Thread Vlastimil Babka

On 06/22/2015 04:18 PM, Eric B Munson wrote:

On Mon, 22 Jun 2015, Michal Hocko wrote:


On Fri 19-06-15 12:43:33, Eric B Munson wrote:

On Fri, 19 Jun 2015, Michal Hocko wrote:


On Thu 18-06-15 16:30:48, Eric B Munson wrote:

On Thu, 18 Jun 2015, Michal Hocko wrote:

[...]

Wouldn't it be much more reasonable and straightforward to have
MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
explicitly disallow any form of pre-faulting? It would be usable for
other usecases than with MAP_LOCKED combination.


I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.


MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
around is all or nothing feature. Either all mappings (which support
this) fault around or none. There is no way to tell the kernel that
this particular mapping shouldn't fault around. I haven't seen such a
request yet but we have seen requests to have a way to opt out from
a global policy in the past (e.g. per-process opt out from THP). So
I can imagine somebody will come with a request to opt out from any
speculative operations on the mapped area in the future.


That sounds like something where new madvise() flag would make more 
sense than a new mmap flag, and conflating it with locking behavior 
would lead to all kinds of weird corner cases as Eric mentioned.





But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.


I am not sure I understand your point here. Could you be more specific
how would you check for that and what for?


My thought on detecting was that someone might want to know if they had
a VMA that was VM_LOCKED but had not been made present becuase of a
failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
is at least explicit about what is happening which would make detecting
the VM_LOCKED but not present state easier.


One could use /proc/pid/pagemap to query the residency.


I think that's all too much complex scenario for a little gain. If 
someone knows that mmap(MAP_LOCKED|MAP_POPULATE) is not perfect, he 
should either mlock() separately from mmap(), or fault the range 
manually with a for loop. Why try to detect if the corner case was hit?





This assumes that
MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
it would have to.


Yes, it would have to have a VM flag for the vma.


So with your approach, VM_LOCKED flag is enough, right? The new MAP_ / 
MLOCK_ flags just cause setting VM_LOCKED to not fault the whole vma, 
but otherwise nothing changes.


If that's true, I think it's better than a new vma flag.




 From my understanding MAP_LOCKONFAULT is essentially
MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
single MAP_LOCKED unfortunately). I would love to also have
MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
skeptical considering how my previous attempt to make MAP_POPULATE
reasonable went.


Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)?


I thought the MAP_FAULTPOPULATE (or any other better name) would
directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
locked semantic. We already have VM_LOCKED for that. The direct effect
of the flag would be to prevent from population other than the direct
page fault - including any speculative actions like fault around or
read-ahead.


I like the ability to control other speculative population, but I am not
sure about overloading it with the VM_LOCKONFAULT case.  Here is my
concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
LOCKONFAULT, how can we tell the difference between someone that wants
to avoid read-ahead and wants to use mlock()?  This might lead to some
interesting states with mlock() and munlock() that take flags.  For
instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
VM_LOCKONFAULT set.  If we use VM_FAULTPOPULATE, the same pair of calls
would clear VM_LOCKED, but leave VM_FAULTPOPULATE.  It may not matter in
the end, but I am concerned about the subtleties here.


Right.




If you prefer that MAP_LOCKED |
MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
instead of introducing MAP_LOCKONFAULT.  I went with the new flag
because to date, we have a one to one mapping of MAP_* to VM_* flags.




If this is the preferred path for mmap(), I am fine with that.



However,
I would like to see the new system calls that Andrew mentioned (and that
I am 

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-22 Thread Michal Hocko
On Fri 19-06-15 12:43:33, Eric B Munson wrote:
 On Fri, 19 Jun 2015, Michal Hocko wrote:
 
  On Thu 18-06-15 16:30:48, Eric B Munson wrote:
   On Thu, 18 Jun 2015, Michal Hocko wrote:
  [...]
Wouldn't it be much more reasonable and straightforward to have
MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
explicitly disallow any form of pre-faulting? It would be usable for
other usecases than with MAP_LOCKED combination.
   
   I don't see a clear case for it being more reasonable, it is one
   possible way to solve the problem.
  
  MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
  around is all or nothing feature. Either all mappings (which support
  this) fault around or none. There is no way to tell the kernel that
  this particular mapping shouldn't fault around. I haven't seen such a
  request yet but we have seen requests to have a way to opt out from
  a global policy in the past (e.g. per-process opt out from THP). So
  I can imagine somebody will come with a request to opt out from any
  speculative operations on the mapped area in the future.
  
   But I think it leaves us in an even
   more akward state WRT VMA flags.  As you noted in your fix for the
   mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
   not present.  Having VM_LOCKONFAULT states that this was intentional, if
   we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
   longer set VM_LOCKONFAULT (unless we want to start mapping it to the
   presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
   populate failure state harder.
  
  I am not sure I understand your point here. Could you be more specific
  how would you check for that and what for?
 
 My thought on detecting was that someone might want to know if they had
 a VMA that was VM_LOCKED but had not been made present becuase of a
 failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
 is at least explicit about what is happening which would make detecting
 the VM_LOCKED but not present state easier. 

One could use /proc/pid/pagemap to query the residency.

 This assumes that
 MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
 it would have to.

Yes, it would have to have a VM flag for the vma.

  From my understanding MAP_LOCKONFAULT is essentially
  MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
  single MAP_LOCKED unfortunately). I would love to also have
  MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
  skeptical considering how my previous attempt to make MAP_POPULATE
  reasonable went.
 
 Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
 new MAP_LOCKONFAULT flag (or both)? 

I thought the MAP_FAULTPOPULATE (or any other better name) would
directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
locked semantic. We already have VM_LOCKED for that. The direct effect
of the flag would be to prevent from population other than the direct
page fault - including any speculative actions like fault around or
read-ahead.

 If you prefer that MAP_LOCKED |
 MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
 instead of introducing MAP_LOCKONFAULT.  I went with the new flag
 because to date, we have a one to one mapping of MAP_* to VM_* flags.
 
  
   If this is the preferred path for mmap(), I am fine with that. 
  
   However,
   I would like to see the new system calls that Andrew mentioned (and that
   I am testing patches for) go in as well. 
  
  mlock with flags sounds like a good step but I am not sure it will make
  sense in the future. POSIX has screwed that and I am not sure how many
  applications would use it. This ship has sailed long time ago.
 
 I don't know either, but the code is the question, right?  I know that
 we have at least one team that wants it here.
 
  
   That way we give users the
   ability to request VM_LOCKONFAULT for memory allocated using something
   other than mmap.
  
  mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
  without changing mlock syscall.
 
 That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s).  It
 doesn't cover the actual case I was asking about, which is how do I get
 lock on fault on malloc'd memory?

OK I see your point now. We would indeed need a flag argument for mlock.
-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-22 Thread Eric B Munson
On Mon, 22 Jun 2015, Michal Hocko wrote:

 On Fri 19-06-15 12:43:33, Eric B Munson wrote:
  On Fri, 19 Jun 2015, Michal Hocko wrote:
  
   On Thu 18-06-15 16:30:48, Eric B Munson wrote:
On Thu, 18 Jun 2015, Michal Hocko wrote:
   [...]
 Wouldn't it be much more reasonable and straightforward to have
 MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
 explicitly disallow any form of pre-faulting? It would be usable for
 other usecases than with MAP_LOCKED combination.

I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.
   
   MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
   around is all or nothing feature. Either all mappings (which support
   this) fault around or none. There is no way to tell the kernel that
   this particular mapping shouldn't fault around. I haven't seen such a
   request yet but we have seen requests to have a way to opt out from
   a global policy in the past (e.g. per-process opt out from THP). So
   I can imagine somebody will come with a request to opt out from any
   speculative operations on the mapped area in the future.
   
But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.
   
   I am not sure I understand your point here. Could you be more specific
   how would you check for that and what for?
  
  My thought on detecting was that someone might want to know if they had
  a VMA that was VM_LOCKED but had not been made present becuase of a
  failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
  is at least explicit about what is happening which would make detecting
  the VM_LOCKED but not present state easier. 
 
 One could use /proc/pid/pagemap to query the residency.
 
  This assumes that
  MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
  it would have to.
 
 Yes, it would have to have a VM flag for the vma.
 
   From my understanding MAP_LOCKONFAULT is essentially
   MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
   single MAP_LOCKED unfortunately). I would love to also have
   MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
   skeptical considering how my previous attempt to make MAP_POPULATE
   reasonable went.
  
  Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
  new MAP_LOCKONFAULT flag (or both)? 
 
 I thought the MAP_FAULTPOPULATE (or any other better name) would
 directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
 locked semantic. We already have VM_LOCKED for that. The direct effect
 of the flag would be to prevent from population other than the direct
 page fault - including any speculative actions like fault around or
 read-ahead.

I like the ability to control other speculative population, but I am not
sure about overloading it with the VM_LOCKONFAULT case.  Here is my
concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
LOCKONFAULT, how can we tell the difference between someone that wants
to avoid read-ahead and wants to use mlock()?  This might lead to some
interesting states with mlock() and munlock() that take flags.  For
instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
VM_LOCKONFAULT set.  If we use VM_FAULTPOPULATE, the same pair of calls
would clear VM_LOCKED, but leave VM_FAULTPOPULATE.  It may not matter in
the end, but I am concerned about the subtleties here.

 
  If you prefer that MAP_LOCKED |
  MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
  instead of introducing MAP_LOCKONFAULT.  I went with the new flag
  because to date, we have a one to one mapping of MAP_* to VM_* flags.
  
   
If this is the preferred path for mmap(), I am fine with that. 
   
However,
I would like to see the new system calls that Andrew mentioned (and that
I am testing patches for) go in as well. 
   
   mlock with flags sounds like a good step but I am not sure it will make
   sense in the future. POSIX has screwed that and I am not sure how many
   applications would use it. This ship has sailed long time ago.
  
  I don't know either, but the code is the question, right?  I know that
  we have at least one team that wants it here.
  
   
That way we give users the
ability to request VM_LOCKONFAULT for memory allocated using something
other than mmap.
   
   mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
   

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-19 Thread Michal Hocko
On Thu 18-06-15 16:30:48, Eric B Munson wrote:
 On Thu, 18 Jun 2015, Michal Hocko wrote:
[...]
  Wouldn't it be much more reasonable and straightforward to have
  MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
  explicitly disallow any form of pre-faulting? It would be usable for
  other usecases than with MAP_LOCKED combination.
 
 I don't see a clear case for it being more reasonable, it is one
 possible way to solve the problem.

MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
around is all or nothing feature. Either all mappings (which support
this) fault around or none. There is no way to tell the kernel that
this particular mapping shouldn't fault around. I haven't seen such a
request yet but we have seen requests to have a way to opt out from
a global policy in the past (e.g. per-process opt out from THP). So
I can imagine somebody will come with a request to opt out from any
speculative operations on the mapped area in the future.

 But I think it leaves us in an even
 more akward state WRT VMA flags.  As you noted in your fix for the
 mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
 not present.  Having VM_LOCKONFAULT states that this was intentional, if
 we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
 longer set VM_LOCKONFAULT (unless we want to start mapping it to the
 presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
 populate failure state harder.

I am not sure I understand your point here. Could you be more specific
how would you check for that and what for?

From my understanding MAP_LOCKONFAULT is essentially
MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
single MAP_LOCKED unfortunately). I would love to also have
MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
skeptical considering how my previous attempt to make MAP_POPULATE
reasonable went.

 If this is the preferred path for mmap(), I am fine with that. 

 However,
 I would like to see the new system calls that Andrew mentioned (and that
 I am testing patches for) go in as well. 

mlock with flags sounds like a good step but I am not sure it will make
sense in the future. POSIX has screwed that and I am not sure how many
applications would use it. This ship has sailed long time ago.

 That way we give users the
 ability to request VM_LOCKONFAULT for memory allocated using something
 other than mmap.

mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
without changing mlock syscall.
 
   This patch introduces the ability to request that pages are not
   pre-faulted, but are placed on the unevictable LRU when they are finally
   faulted in.
   
   To keep accounting checks out of the page fault path, users are billed
   for the entire mapping lock as if MAP_LOCKED was used.
   
   Signed-off-by: Eric B Munson emun...@akamai.com
   Cc: Michal Hocko mho...@suse.cz
   Cc: linux-al...@vger.kernel.org
   Cc: linux-ker...@vger.kernel.org
   Cc: linux-m...@linux-mips.org
   Cc: linux-par...@vger.kernel.org
   Cc: linuxppc-dev@lists.ozlabs.org
   Cc: sparcli...@vger.kernel.org
   Cc: linux-xte...@linux-xtensa.org
   Cc: linux...@kvack.org
   Cc: linux-a...@vger.kernel.org
   Cc: linux-...@vger.kernel.org
   ---
[...]
-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-19 Thread Eric B Munson
On Fri, 19 Jun 2015, Michal Hocko wrote:

 On Thu 18-06-15 16:30:48, Eric B Munson wrote:
  On Thu, 18 Jun 2015, Michal Hocko wrote:
 [...]
   Wouldn't it be much more reasonable and straightforward to have
   MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
   explicitly disallow any form of pre-faulting? It would be usable for
   other usecases than with MAP_LOCKED combination.
  
  I don't see a clear case for it being more reasonable, it is one
  possible way to solve the problem.
 
 MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
 around is all or nothing feature. Either all mappings (which support
 this) fault around or none. There is no way to tell the kernel that
 this particular mapping shouldn't fault around. I haven't seen such a
 request yet but we have seen requests to have a way to opt out from
 a global policy in the past (e.g. per-process opt out from THP). So
 I can imagine somebody will come with a request to opt out from any
 speculative operations on the mapped area in the future.
 
  But I think it leaves us in an even
  more akward state WRT VMA flags.  As you noted in your fix for the
  mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
  not present.  Having VM_LOCKONFAULT states that this was intentional, if
  we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
  longer set VM_LOCKONFAULT (unless we want to start mapping it to the
  presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
  populate failure state harder.
 
 I am not sure I understand your point here. Could you be more specific
 how would you check for that and what for?

My thought on detecting was that someone might want to know if they had
a VMA that was VM_LOCKED but had not been made present becuase of a
failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
is at least explicit about what is happening which would make detecting
the VM_LOCKED but not present state easier.  This assumes that
MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
it would have to.

 
 From my understanding MAP_LOCKONFAULT is essentially
 MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
 single MAP_LOCKED unfortunately). I would love to also have
 MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
 skeptical considering how my previous attempt to make MAP_POPULATE
 reasonable went.

Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)?  If you prefer that MAP_LOCKED |
MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
instead of introducing MAP_LOCKONFAULT.  I went with the new flag
because to date, we have a one to one mapping of MAP_* to VM_* flags.

 
  If this is the preferred path for mmap(), I am fine with that. 
 
  However,
  I would like to see the new system calls that Andrew mentioned (and that
  I am testing patches for) go in as well. 
 
 mlock with flags sounds like a good step but I am not sure it will make
 sense in the future. POSIX has screwed that and I am not sure how many
 applications would use it. This ship has sailed long time ago.

I don't know either, but the code is the question, right?  I know that
we have at least one team that wants it here.

 
  That way we give users the
  ability to request VM_LOCKONFAULT for memory allocated using something
  other than mmap.
 
 mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
 without changing mlock syscall.

That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s).  It
doesn't cover the actual case I was asking about, which is how do I get
lock on fault on malloc'd memory?

  
This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 [...]
 -- 
 Michal Hocko
 SUSE Labs


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-18 Thread Michal Hocko
[Sorry for the late reply - I meant to answer in the previous threads
 but something always preempted me from that]

On Wed 10-06-15 09:26:48, Eric B Munson wrote:
 The cost of faulting in all memory to be locked can be very high when
 working with large mappings.  If only portions of the mapping will be
 used this can incur a high penalty for locking.
 
 For the example of a large file, this is the usage pattern for a large
 statical language model (probably applies to other statical or graphical
 models as well).  For the security example, any application transacting
 in data that cannot be swapped out (credit card data, medical records,
 etc).

Such a use case makes some sense to me but I am not sure the way you
implement it is the right one. This is another mlock related flag for
mmap with a different semantic. You do not want to prefault but e.g. is
the readahead or fault around acceptable? I do not see anything in your
patch to handle those...

Wouldn't it be much more reasonable and straightforward to have
MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
explicitly disallow any form of pre-faulting? It would be usable for
other usecases than with MAP_LOCKED combination.

 This patch introduces the ability to request that pages are not
 pre-faulted, but are placed on the unevictable LRU when they are finally
 faulted in.
 
 To keep accounting checks out of the page fault path, users are billed
 for the entire mapping lock as if MAP_LOCKED was used.
 
 Signed-off-by: Eric B Munson emun...@akamai.com
 Cc: Michal Hocko mho...@suse.cz
 Cc: linux-al...@vger.kernel.org
 Cc: linux-ker...@vger.kernel.org
 Cc: linux-m...@linux-mips.org
 Cc: linux-par...@vger.kernel.org
 Cc: linuxppc-dev@lists.ozlabs.org
 Cc: sparcli...@vger.kernel.org
 Cc: linux-xte...@linux-xtensa.org
 Cc: linux...@kvack.org
 Cc: linux-a...@vger.kernel.org
 Cc: linux-...@vger.kernel.org
 ---
  arch/alpha/include/uapi/asm/mman.h   | 1 +
  arch/mips/include/uapi/asm/mman.h| 1 +
  arch/parisc/include/uapi/asm/mman.h  | 1 +
  arch/powerpc/include/uapi/asm/mman.h | 1 +
  arch/sparc/include/uapi/asm/mman.h   | 1 +
  arch/tile/include/uapi/asm/mman.h| 1 +
  arch/xtensa/include/uapi/asm/mman.h  | 1 +
  include/linux/mm.h   | 1 +
  include/linux/mman.h | 3 ++-
  include/uapi/asm-generic/mman.h  | 1 +
  mm/mmap.c| 4 ++--
  mm/swap.c| 3 ++-
  12 files changed, 15 insertions(+), 4 deletions(-)
 
 diff --git a/arch/alpha/include/uapi/asm/mman.h 
 b/arch/alpha/include/uapi/asm/mman.h
 index 0086b47..15e96e1 100644
 --- a/arch/alpha/include/uapi/asm/mman.h
 +++ b/arch/alpha/include/uapi/asm/mman.h
 @@ -30,6 +30,7 @@
  #define MAP_NONBLOCK 0x4 /* do not block on IO */
  #define MAP_STACK0x8 /* give out an address that is best 
 suited for process/thread stacks */
  #define MAP_HUGETLB  0x10/* create a huge page mapping */
 +#define MAP_LOCKONFAULT  0x20/* Lock pages after they are 
 faulted in, do not prefault */
  
  #define MS_ASYNC 1   /* sync memory asynchronously */
  #define MS_SYNC  2   /* synchronous memory sync */
 diff --git a/arch/mips/include/uapi/asm/mman.h 
 b/arch/mips/include/uapi/asm/mman.h
 index cfcb876..47846a5 100644
 --- a/arch/mips/include/uapi/asm/mman.h
 +++ b/arch/mips/include/uapi/asm/mman.h
 @@ -48,6 +48,7 @@
  #define MAP_NONBLOCK 0x2 /* do not block on IO */
  #define MAP_STACK0x4 /* give out an address that is best 
 suited for process/thread stacks */
  #define MAP_HUGETLB  0x8 /* create a huge page mapping */
 +#define MAP_LOCKONFAULT  0x10/* Lock pages after they are 
 faulted in, do not prefault */
  
  /*
   * Flags for msync
 diff --git a/arch/parisc/include/uapi/asm/mman.h 
 b/arch/parisc/include/uapi/asm/mman.h
 index 294d251..1514cd7 100644
 --- a/arch/parisc/include/uapi/asm/mman.h
 +++ b/arch/parisc/include/uapi/asm/mman.h
 @@ -24,6 +24,7 @@
  #define MAP_NONBLOCK 0x2 /* do not block on IO */
  #define MAP_STACK0x4 /* give out an address that is best 
 suited for process/thread stacks */
  #define MAP_HUGETLB  0x8 /* create a huge page mapping */
 +#define MAP_LOCKONFAULT  0x10/* Lock pages after they are 
 faulted in, do not prefault */
  
  #define MS_SYNC  1   /* synchronous memory sync */
  #define MS_ASYNC 2   /* sync memory asynchronously */
 diff --git a/arch/powerpc/include/uapi/asm/mman.h 
 b/arch/powerpc/include/uapi/asm/mman.h
 index 6ea26df..fce74fe 100644
 --- a/arch/powerpc/include/uapi/asm/mman.h
 +++ b/arch/powerpc/include/uapi/asm/mman.h
 @@ -27,5 +27,6 @@
  #define MAP_NONBLOCK 0x1 /* do not block on IO */
  #define MAP_STACK0x2 /* give out an address that is best 
 suited for process/thread stacks */
  #define 

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-18 Thread Eric B Munson
On Thu, 18 Jun 2015, Michal Hocko wrote:

 [Sorry for the late reply - I meant to answer in the previous threads
  but something always preempted me from that]
 
 On Wed 10-06-15 09:26:48, Eric B Munson wrote:
  The cost of faulting in all memory to be locked can be very high when
  working with large mappings.  If only portions of the mapping will be
  used this can incur a high penalty for locking.
  
  For the example of a large file, this is the usage pattern for a large
  statical language model (probably applies to other statical or graphical
  models as well).  For the security example, any application transacting
  in data that cannot be swapped out (credit card data, medical records,
  etc).
 
 Such a use case makes some sense to me but I am not sure the way you
 implement it is the right one. This is another mlock related flag for
 mmap with a different semantic. You do not want to prefault but e.g. is
 the readahead or fault around acceptable? I do not see anything in your
 patch to handle those...

We haven't bumped into readahead or fault around causing performance
problems for us.  If they cause problems for users when LOCKONFAULT is
in use then we can address them.

 
 Wouldn't it be much more reasonable and straightforward to have
 MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
 explicitly disallow any form of pre-faulting? It would be usable for
 other usecases than with MAP_LOCKED combination.

I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.  But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.

If this is the preferred path for mmap(), I am fine with that.  However,
I would like to see the new system calls that Andrew mentioned (and that
I am testing patches for) go in as well.  That way we give users the
ability to request VM_LOCKONFAULT for memory allocated using something
other than mmap.

 
  This patch introduces the ability to request that pages are not
  pre-faulted, but are placed on the unevictable LRU when they are finally
  faulted in.
  
  To keep accounting checks out of the page fault path, users are billed
  for the entire mapping lock as if MAP_LOCKED was used.
  
  Signed-off-by: Eric B Munson emun...@akamai.com
  Cc: Michal Hocko mho...@suse.cz
  Cc: linux-al...@vger.kernel.org
  Cc: linux-ker...@vger.kernel.org
  Cc: linux-m...@linux-mips.org
  Cc: linux-par...@vger.kernel.org
  Cc: linuxppc-dev@lists.ozlabs.org
  Cc: sparcli...@vger.kernel.org
  Cc: linux-xte...@linux-xtensa.org
  Cc: linux...@kvack.org
  Cc: linux-a...@vger.kernel.org
  Cc: linux-...@vger.kernel.org
  ---
   arch/alpha/include/uapi/asm/mman.h   | 1 +
   arch/mips/include/uapi/asm/mman.h| 1 +
   arch/parisc/include/uapi/asm/mman.h  | 1 +
   arch/powerpc/include/uapi/asm/mman.h | 1 +
   arch/sparc/include/uapi/asm/mman.h   | 1 +
   arch/tile/include/uapi/asm/mman.h| 1 +
   arch/xtensa/include/uapi/asm/mman.h  | 1 +
   include/linux/mm.h   | 1 +
   include/linux/mman.h | 3 ++-
   include/uapi/asm-generic/mman.h  | 1 +
   mm/mmap.c| 4 ++--
   mm/swap.c| 3 ++-
   12 files changed, 15 insertions(+), 4 deletions(-)
  
  diff --git a/arch/alpha/include/uapi/asm/mman.h 
  b/arch/alpha/include/uapi/asm/mman.h
  index 0086b47..15e96e1 100644
  --- a/arch/alpha/include/uapi/asm/mman.h
  +++ b/arch/alpha/include/uapi/asm/mman.h
  @@ -30,6 +30,7 @@
   #define MAP_NONBLOCK   0x4 /* do not block on IO */
   #define MAP_STACK  0x8 /* give out an address that is best 
  suited for process/thread stacks */
   #define MAP_HUGETLB0x10/* create a huge page mapping */
  +#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
  faulted in, do not prefault */
   
   #define MS_ASYNC   1   /* sync memory asynchronously */
   #define MS_SYNC2   /* synchronous memory sync */
  diff --git a/arch/mips/include/uapi/asm/mman.h 
  b/arch/mips/include/uapi/asm/mman.h
  index cfcb876..47846a5 100644
  --- a/arch/mips/include/uapi/asm/mman.h
  +++ b/arch/mips/include/uapi/asm/mman.h
  @@ -48,6 +48,7 @@
   #define MAP_NONBLOCK   0x2 /* do not block on IO */
   #define MAP_STACK  0x4 /* give out an address that is best 
  suited for process/thread stacks */
   #define MAP_HUGETLB0x8 /* create a huge page mapping */
  +#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
  faulted