On Thu, 18 Jun 2015, Michal Hocko wrote:

> [Sorry for the late reply - I meant to answer in the previous threads
>  but something always preempted me from that]
> 
> On Wed 10-06-15 09:26:48, Eric B Munson wrote:
> > The cost of faulting in all memory to be locked can be very high when
> > working with large mappings.  If only portions of the mapping will be
> > used this can incur a high penalty for locking.
> > 
> > For the example of a large file, this is the usage pattern for a large
> > statical language model (probably applies to other statical or graphical
> > models as well).  For the security example, any application transacting
> > in data that cannot be swapped out (credit card data, medical records,
> > etc).
> 
> Such a use case makes some sense to me but I am not sure the way you
> implement it is the right one. This is another mlock related flag for
> mmap with a different semantic. You do not want to prefault but e.g. is
> the readahead or fault around acceptable? I do not see anything in your
> patch to handle those...

We haven't bumped into readahead or fault around causing performance
problems for us.  If they cause problems for users when LOCKONFAULT is
in use then we can address them.

> 
> Wouldn't it be much more reasonable and straightforward to have
> MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
> explicitly disallow any form of pre-faulting? It would be usable for
> other usecases than with MAP_LOCKED combination.

I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.  But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.

If this is the preferred path for mmap(), I am fine with that.  However,
I would like to see the new system calls that Andrew mentioned (and that
I am testing patches for) go in as well.  That way we give users the
ability to request VM_LOCKONFAULT for memory allocated using something
other than mmap.

> 
> > This patch introduces the ability to request that pages are not
> > pre-faulted, but are placed on the unevictable LRU when they are finally
> > faulted in.
> > 
> > To keep accounting checks out of the page fault path, users are billed
> > for the entire mapping lock as if MAP_LOCKED was used.
> > 
> > Signed-off-by: Eric B Munson <emun...@akamai.com>
> > Cc: Michal Hocko <mho...@suse.cz>
> > Cc: linux-al...@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: linux-m...@linux-mips.org
> > Cc: linux-par...@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: sparcli...@vger.kernel.org
> > Cc: linux-xte...@linux-xtensa.org
> > Cc: linux...@kvack.org
> > Cc: linux-a...@vger.kernel.org
> > Cc: linux-...@vger.kernel.org
> > ---
> >  arch/alpha/include/uapi/asm/mman.h   | 1 +
> >  arch/mips/include/uapi/asm/mman.h    | 1 +
> >  arch/parisc/include/uapi/asm/mman.h  | 1 +
> >  arch/powerpc/include/uapi/asm/mman.h | 1 +
> >  arch/sparc/include/uapi/asm/mman.h   | 1 +
> >  arch/tile/include/uapi/asm/mman.h    | 1 +
> >  arch/xtensa/include/uapi/asm/mman.h  | 1 +
> >  include/linux/mm.h                   | 1 +
> >  include/linux/mman.h                 | 3 ++-
> >  include/uapi/asm-generic/mman.h      | 1 +
> >  mm/mmap.c                            | 4 ++--
> >  mm/swap.c                            | 3 ++-
> >  12 files changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/alpha/include/uapi/asm/mman.h 
> > b/arch/alpha/include/uapi/asm/mman.h
> > index 0086b47..15e96e1 100644
> > --- a/arch/alpha/include/uapi/asm/mman.h
> > +++ b/arch/alpha/include/uapi/asm/mman.h
> > @@ -30,6 +30,7 @@
> >  #define MAP_NONBLOCK       0x40000         /* do not block on IO */
> >  #define MAP_STACK  0x80000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x100000        /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x200000        /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  #define MS_ASYNC   1               /* sync memory asynchronously */
> >  #define MS_SYNC            2               /* synchronous memory sync */
> > diff --git a/arch/mips/include/uapi/asm/mman.h 
> > b/arch/mips/include/uapi/asm/mman.h
> > index cfcb876..47846a5 100644
> > --- a/arch/mips/include/uapi/asm/mman.h
> > +++ b/arch/mips/include/uapi/asm/mman.h
> > @@ -48,6 +48,7 @@
> >  #define MAP_NONBLOCK       0x20000         /* do not block on IO */
> >  #define MAP_STACK  0x40000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x80000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x100000        /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  /*
> >   * Flags for msync
> > diff --git a/arch/parisc/include/uapi/asm/mman.h 
> > b/arch/parisc/include/uapi/asm/mman.h
> > index 294d251..1514cd7 100644
> > --- a/arch/parisc/include/uapi/asm/mman.h
> > +++ b/arch/parisc/include/uapi/asm/mman.h
> > @@ -24,6 +24,7 @@
> >  #define MAP_NONBLOCK       0x20000         /* do not block on IO */
> >  #define MAP_STACK  0x40000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x80000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x100000        /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  #define MS_SYNC            1               /* synchronous memory sync */
> >  #define MS_ASYNC   2               /* sync memory asynchronously */
> > diff --git a/arch/powerpc/include/uapi/asm/mman.h 
> > b/arch/powerpc/include/uapi/asm/mman.h
> > index 6ea26df..fce74fe 100644
> > --- a/arch/powerpc/include/uapi/asm/mman.h
> > +++ b/arch/powerpc/include/uapi/asm/mman.h
> > @@ -27,5 +27,6 @@
> >  #define MAP_NONBLOCK       0x10000         /* do not block on IO */
> >  #define MAP_STACK  0x20000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x40000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x80000         /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> > diff --git a/arch/sparc/include/uapi/asm/mman.h 
> > b/arch/sparc/include/uapi/asm/mman.h
> > index 0b14df3..12425d8 100644
> > --- a/arch/sparc/include/uapi/asm/mman.h
> > +++ b/arch/sparc/include/uapi/asm/mman.h
> > @@ -22,6 +22,7 @@
> >  #define MAP_NONBLOCK       0x10000         /* do not block on IO */
> >  #define MAP_STACK  0x20000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x40000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x80000         /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  
> >  #endif /* _UAPI__SPARC_MMAN_H__ */
> > diff --git a/arch/tile/include/uapi/asm/mman.h 
> > b/arch/tile/include/uapi/asm/mman.h
> > index 81b8fc3..ec04eaf 100644
> > --- a/arch/tile/include/uapi/asm/mman.h
> > +++ b/arch/tile/include/uapi/asm/mman.h
> > @@ -29,6 +29,7 @@
> >  #define MAP_DENYWRITE      0x0800          /* ETXTBSY */
> >  #define MAP_EXECUTABLE     0x1000          /* mark it as an executable */
> >  #define MAP_HUGETLB        0x4000          /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x8000          /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  
> >  /*
> > diff --git a/arch/xtensa/include/uapi/asm/mman.h 
> > b/arch/xtensa/include/uapi/asm/mman.h
> > index 201aec0..42d43cc 100644
> > --- a/arch/xtensa/include/uapi/asm/mman.h
> > +++ b/arch/xtensa/include/uapi/asm/mman.h
> > @@ -55,6 +55,7 @@
> >  #define MAP_NONBLOCK       0x20000         /* do not block on IO */
> >  #define MAP_STACK  0x40000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x80000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x100000        /* Lock pages after they are 
> > faulted in, do not prefault */
> >  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
> >  # define MAP_UNINITIALIZED 0x4000000       /* For anonymous mmap, memory 
> > could be
> >                                      * uninitialized */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0755b9f..3e31457 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp);
> >  #define VM_PFNMAP  0x00000400      /* Page-ranges managed without "struct 
> > page", just pure PFN */
> >  #define VM_DENYWRITE       0x00000800      /* ETXTBSY on write attempts.. 
> > */
> >  
> > +#define VM_LOCKONFAULT     0x00001000      /* Lock the pages covered when 
> > they are faulted in */
> >  #define VM_LOCKED  0x00002000
> >  #define VM_IO           0x00004000 /* Memory mapped I/O or similar */
> >  
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 16373c8..437264b 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
> >  {
> >     return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> >            _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
> > -          _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
> > +          _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > +          _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT);
> >  }
> >  
> >  unsigned long vm_commit_limit(void);
> > diff --git a/include/uapi/asm-generic/mman.h 
> > b/include/uapi/asm-generic/mman.h
> > index e9fe6fd..fc4e586 100644
> > --- a/include/uapi/asm-generic/mman.h
> > +++ b/include/uapi/asm-generic/mman.h
> > @@ -12,6 +12,7 @@
> >  #define MAP_NONBLOCK       0x10000         /* do not block on IO */
> >  #define MAP_STACK  0x20000         /* give out an address that is best 
> > suited for process/thread stacks */
> >  #define MAP_HUGETLB        0x40000         /* create a huge page mapping */
> > +#define MAP_LOCKONFAULT    0x80000         /* Lock pages after they are 
> > faulted in, do not prefault */
> >  
> >  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
> >  
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index bb50cac..ba1a6bf 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct 
> > *mm,
> >     unsigned long locked, lock_limit;
> >  
> >     /*  mlock MCL_FUTURE? */
> > -   if (flags & VM_LOCKED) {
> > +   if (flags & (VM_LOCKED | VM_LOCKONFAULT)) {
> >             locked = len >> PAGE_SHIFT;
> >             locked += mm->locked_vm;
> >             lock_limit = rlimit(RLIMIT_MEMLOCK);
> > @@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, 
> > unsigned long addr,
> >     vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
> >                     mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
> >  
> > -   if (flags & MAP_LOCKED)
> > +   if (flags & (MAP_LOCKED | MAP_LOCKONFAULT))
> >             if (!can_do_mlock())
> >                     return -EPERM;
> >  
> > diff --git a/mm/swap.c b/mm/swap.c
> > index a7251a8..07c905e 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page 
> > *page,
> >  {
> >     VM_BUG_ON_PAGE(PageLRU(page), page);
> >  
> > -   if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
> > +   if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) ||
> > +              (vma->vm_flags & VM_SPECIAL)) {
> >             SetPageActive(page);
> >             lru_cache_add(page);
> >             return;
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to