Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-24 Thread Kirill A. Shutemov
On Fri, Mar 24, 2017 at 02:44:10PM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On Friday 24 March 2017 02:34 PM, Kirill A. Shutemov wrote:
> > On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote:
> > > "Kirill A. Shutemov"  writes:
> > >  @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, 
> > > const unsigned long addr0,
> > > > unsigned long addr = addr0;
> > > > struct vm_unmapped_area_info info;
> > > > 
> > > > +   addr = mpx_unmapped_area_check(addr, len, flags);
> > > > +   if (IS_ERR_VALUE(addr))
> > > > +   return addr;
> > > > +
> > > > /* requested length too big for entire address space */
> > > > if (len > TASK_SIZE)
> > > > return -ENOMEM;
> > > > @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, 
> > > > const unsigned long addr0,
> > > > info.length = len;
> > > > info.low_limit = PAGE_SIZE;
> > > > info.high_limit = mm->mmap_base;
> > > > +
> > > > +   /*
> > > > +* If hint address is above DEFAULT_MAP_WINDOW, look for 
> > > > unmapped area
> > > > +* in the full address space.
> > > > +*/
> > > > +   if (addr > DEFAULT_MAP_WINDOW)
> > > > +   info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW;
> > > > +
> > > 
> > > Is this ok for 32 bit application ?
> > 
> > DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will
> > be compile out.
> > 
> 
> That is not about CONFIG_X86_32 but about 32 bit application on a 64 bit
> kernel. I guess we will never find addr > DEFAULT_MAP_WINDOW with
> a 32 bit app ?

I have local change to avoid this within 32-bit syscall, but I'll need to
recheck everthing.

-- 
 Kirill A. Shutemov


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-24 Thread Aneesh Kumar K.V



On Friday 24 March 2017 02:34 PM, Kirill A. Shutemov wrote:

On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote:

"Kirill A. Shutemov"  writes:
 @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,

unsigned long addr = addr0;
struct vm_unmapped_area_info info;

+   addr = mpx_unmapped_area_check(addr, len, flags);
+   if (IS_ERR_VALUE(addr))
+   return addr;
+
/* requested length too big for entire address space */
if (len > TASK_SIZE)
return -ENOMEM;
@@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = mm->mmap_base;
+
+   /*
+* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
+* in the full address space.
+*/
+   if (addr > DEFAULT_MAP_WINDOW)
+   info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW;
+


Is this ok for 32 bit application ?


DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will
be compile out.



That is not about CONFIG_X86_32 but about 32 bit application on a 64 bit 
kernel. I guess we will never find addr > DEFAULT_MAP_WINDOW with

a 32 bit app ?

-aneesh



Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-24 Thread Kirill A. Shutemov
On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov"  writes:
>  @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
> unsigned long addr0,
> > unsigned long addr = addr0;
> > struct vm_unmapped_area_info info;
> >  
> > +   addr = mpx_unmapped_area_check(addr, len, flags);
> > +   if (IS_ERR_VALUE(addr))
> > +   return addr;
> > +
> > /* requested length too big for entire address space */
> > if (len > TASK_SIZE)
> > return -ENOMEM;
> > @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, 
> > const unsigned long addr0,
> > info.length = len;
> > info.low_limit = PAGE_SIZE;
> > info.high_limit = mm->mmap_base;
> > +
> > +   /*
> > +* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
> > +* in the full address space.
> > +*/
> > +   if (addr > DEFAULT_MAP_WINDOW)
> > +   info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW;
> > +
> 
> Is this ok for 32 bit application ?

DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will
be compile out.

-- 
 Kirill A. Shutemov


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-24 Thread Kirill A. Shutemov
On Sun, Mar 19, 2017 at 02:25:08PM +0530, Aneesh Kumar K.V wrote:
> >>> So if I have done a successful mmap which returned > 128TB what should a
> >>> following mmap(0,...) return ? Should that now search the *full* address
> >>> space or below 128TB ?
> >>
> >> No, I don't think so. And this implementation doesn't do this.
> >>
> >> It's safer this way: if an library can't handle high addresses, it's
> >> better not to switch it automagically to full address space if other part
> >> of the process requested high address.
> >>
> >
> > What is the epectation when the hint addr is below 128TB but addr + len >
> > 128TB ? Should such mmap request fail ?
> 
> Considering that we have stack at the top (around 128TB) we may not be
> able to get a free area for such a request. But I guess the idea here is
> that if hint address is below 128TB, we behave as though our TASK_SIZE
> is 128TB ? Is that correct ?

Right.

-- 
 Kirill A. Shutemov


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-24 Thread Kirill A. Shutemov
On Mon, Mar 20, 2017 at 11:08:41AM -0700, h...@zytor.com wrote:
> This *better* be conditional on some kind of settable limit.  Having a
> barrier in the middle of the address space for no apparent reason to
> "clean" software is insane.

I had the same argument (on your side) before, but if you look on numbers
it's far from the middle of address space. The barrier is around 0.2% from
the start 56-bit address space.

And it's we have vdso/vvar/stack just below the barier anyway.

I don't think we would loose much if wouldn't not allow VMA to sit
across it.

-- 
 Kirill A. Shutemov


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-20 Thread Matthew Wilcox
On Mon, Mar 20, 2017 at 11:08:41AM -0700, h...@zytor.com wrote:
> On March 19, 2017 1:26:58 AM PDT, "Kirill A. Shutemov"  
> wrote:
> >On Mar 19, 2017 09:25, "Aneesh Kumar K.V"  
> >wrote:
> > > What is the epectation when the hint addr is below 128TB but addr + len
> > > 128TB ? Should such mmap request fail ?
> >
> >Yes, I believe so.
> 
> This *better* be conditional on some kind of settable limit.  Having a
> barrier in the middle of the address space for no apparent reason to
> "clean" software is insane.

I disagree with Kirill here.  If addr+len > 128TB, I think we should
assume the application is 57-bit aware.

Specifying hint addresses is such a rare thing to do anyway.


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-20 Thread hpa
On March 19, 2017 1:26:58 AM PDT, "Kirill A. Shutemov"  
wrote:
>On Mar 19, 2017 09:25, "Aneesh Kumar K.V"
>
>wrote:
>
>"Kirill A. Shutemov"  writes:
>
>> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote:
>>> "Kirill A. Shutemov"  writes:
>>>
>>> > On x86, 5-level paging enables 56-bit userspace virtual address
>space.
>>> > Not all user space is ready to handle wide addresses. It's known
>that
>>> > at least some JIT compilers use higher bits in pointers to encode
>their
>>> > information. It collides with valid pointers with 5-level paging
>and
>>> > leads to crashes.
>>> >
>>> > To mitigate this, we are not going to allocate virtual address
>space
>>> > above 47-bit by default.
>>> >
>>> > But userspace can ask for allocation from full address space by
>>> > specifying hint address (with or without MAP_FIXED) above 47-bits.
>>> >
>>> > If hint address set above 47-bit, but MAP_FIXED is not specified,
>we
>try
>>> > to look for unmapped area by specified address. If it's already
>>> > occupied, we look for unmapped area in *full* address space,
>rather
>than
>>> > from 47-bit window.
>>> >
>>> > This approach helps to easily make application's memory allocator
>aware
>>> > about large address space without manually tracking allocated
>virtual
>>> > address space.
>>> >
>>>
>>> So if I have done a successful mmap which returned > 128TB what
>should a
>>> following mmap(0,...) return ? Should that now search the *full*
>address
>>> space or below 128TB ?
>>
>> No, I don't think so. And this implementation doesn't do this.
>>
>> It's safer this way: if an library can't handle high addresses, it's
>> better not to switch it automagically to full address space if other
>part
>> of the process requested high address.
>>
>
>What is the epectation when the hint addr is below 128TB but addr + len
>>
>128TB ? Should such mmap request fail ?
>
>
>Yes, I believe so.

This *better* be conditional on some kind of settable limit.  Having a barrier 
in the middle of the address space for no apparent reason to "clean" software 
is insane.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-20 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> "Kirill A. Shutemov"  writes:
>> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote:
>>> So if I have done a successful mmap which returned > 128TB what should a
>>> following mmap(0,...) return ? Should that now search the *full* address
>>> space or below 128TB ?
>>
>> No, I don't think so. And this implementation doesn't do this.
>>
>> It's safer this way: if an library can't handle high addresses, it's
>> better not to switch it automagically to full address space if other part
>> of the process requested high address.
>
> What is the epectation when the hint addr is below 128TB but addr + len >
> 128TB ? Should such mmap request fail ?

Yeah I think that makes sense, it retains the existing behaviour unless
the hint itself is >= 128TB.

cheers


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-19 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:
 @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
>   unsigned long addr = addr0;
>   struct vm_unmapped_area_info info;
>  
> + addr = mpx_unmapped_area_check(addr, len, flags);
> + if (IS_ERR_VALUE(addr))
> + return addr;
> +
>   /* requested length too big for entire address space */
>   if (len > TASK_SIZE)
>   return -ENOMEM;
> @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
> unsigned long addr0,
>   info.length = len;
>   info.low_limit = PAGE_SIZE;
>   info.high_limit = mm->mmap_base;
> +
> + /*
> +  * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
> +  * in the full address space.
> +  */
> + if (addr > DEFAULT_MAP_WINDOW)
> + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW;
> +

Is this ok for 32 bit application ?


>   info.align_mask = 0;
>   info.align_offset = pgoff << PAGE_SHIFT;
>   if (filp) {


-aneesh



Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-19 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:

> "Kirill A. Shutemov"  writes:
>
>> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote:
>>> "Kirill A. Shutemov"  writes:
>>> 
>>> > On x86, 5-level paging enables 56-bit userspace virtual address space.
>>> > Not all user space is ready to handle wide addresses. It's known that
>>> > at least some JIT compilers use higher bits in pointers to encode their
>>> > information. It collides with valid pointers with 5-level paging and
>>> > leads to crashes.
>>> >
>>> > To mitigate this, we are not going to allocate virtual address space
>>> > above 47-bit by default.
>>> >
>>> > But userspace can ask for allocation from full address space by
>>> > specifying hint address (with or without MAP_FIXED) above 47-bits.
>>> >
>>> > If hint address set above 47-bit, but MAP_FIXED is not specified, we try
>>> > to look for unmapped area by specified address. If it's already
>>> > occupied, we look for unmapped area in *full* address space, rather than
>>> > from 47-bit window.
>>> >
>>> > This approach helps to easily make application's memory allocator aware
>>> > about large address space without manually tracking allocated virtual
>>> > address space.
>>> >
>>> 
>>> So if I have done a successful mmap which returned > 128TB what should a
>>> following mmap(0,...) return ? Should that now search the *full* address
>>> space or below 128TB ?
>>
>> No, I don't think so. And this implementation doesn't do this.
>>
>> It's safer this way: if an library can't handle high addresses, it's
>> better not to switch it automagically to full address space if other part
>> of the process requested high address.
>>
>
> What is the epectation when the hint addr is below 128TB but addr + len >
> 128TB ? Should such mmap request fail ?

Considering that we have stack at the top (around 128TB) we may not be
able to get a free area for such a request. But I guess the idea here is
that if hint address is below 128TB, we behave as though our TASK_SIZE
is 128TB ? Is that correct ?
 
-aneesh



Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-19 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:

> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov"  writes:
>> 
>> > On x86, 5-level paging enables 56-bit userspace virtual address space.
>> > Not all user space is ready to handle wide addresses. It's known that
>> > at least some JIT compilers use higher bits in pointers to encode their
>> > information. It collides with valid pointers with 5-level paging and
>> > leads to crashes.
>> >
>> > To mitigate this, we are not going to allocate virtual address space
>> > above 47-bit by default.
>> >
>> > But userspace can ask for allocation from full address space by
>> > specifying hint address (with or without MAP_FIXED) above 47-bits.
>> >
>> > If hint address set above 47-bit, but MAP_FIXED is not specified, we try
>> > to look for unmapped area by specified address. If it's already
>> > occupied, we look for unmapped area in *full* address space, rather than
>> > from 47-bit window.
>> >
>> > This approach helps to easily make application's memory allocator aware
>> > about large address space without manually tracking allocated virtual
>> > address space.
>> >
>> 
>> So if I have done a successful mmap which returned > 128TB what should a
>> following mmap(0,...) return ? Should that now search the *full* address
>> space or below 128TB ?
>
> No, I don't think so. And this implementation doesn't do this.
>
> It's safer this way: if an library can't handle high addresses, it's
> better not to switch it automagically to full address space if other part
> of the process requested high address.
>

What is the epectation when the hint addr is below 128TB but addr + len >
128TB ? Should such mmap request fail ?

-aneesh



Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-18 Thread Kirill A. Shutemov
On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov"  writes:
> 
> > On x86, 5-level paging enables 56-bit userspace virtual address space.
> > Not all user space is ready to handle wide addresses. It's known that
> > at least some JIT compilers use higher bits in pointers to encode their
> > information. It collides with valid pointers with 5-level paging and
> > leads to crashes.
> >
> > To mitigate this, we are not going to allocate virtual address space
> > above 47-bit by default.
> >
> > But userspace can ask for allocation from full address space by
> > specifying hint address (with or without MAP_FIXED) above 47-bits.
> >
> > If hint address set above 47-bit, but MAP_FIXED is not specified, we try
> > to look for unmapped area by specified address. If it's already
> > occupied, we look for unmapped area in *full* address space, rather than
> > from 47-bit window.
> >
> > This approach helps to easily make application's memory allocator aware
> > about large address space without manually tracking allocated virtual
> > address space.
> >
> 
> So if I have done a successful mmap which returned > 128TB what should a
> following mmap(0,...) return ? Should that now search the *full* address
> space or below 128TB ?

No, I don't think so. And this implementation doesn't do this.

It's safer this way: if an library can't handle high addresses, it's
better not to switch it automagically to full address space if other part
of the process requested high address.

-- 
 Kirill A. Shutemov


Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-17 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:

> On x86, 5-level paging enables 56-bit userspace virtual address space.
> Not all user space is ready to handle wide addresses. It's known that
> at least some JIT compilers use higher bits in pointers to encode their
> information. It collides with valid pointers with 5-level paging and
> leads to crashes.
>
> To mitigate this, we are not going to allocate virtual address space
> above 47-bit by default.
>
> But userspace can ask for allocation from full address space by
> specifying hint address (with or without MAP_FIXED) above 47-bits.
>
> If hint address set above 47-bit, but MAP_FIXED is not specified, we try
> to look for unmapped area by specified address. If it's already
> occupied, we look for unmapped area in *full* address space, rather than
> from 47-bit window.
>
> This approach helps to easily make application's memory allocator aware
> about large address space without manually tracking allocated virtual
> address space.
>

So if I have done a successful mmap which returned > 128TB what should a
following mmap(0,...) return ? Should that now search the *full* address
space or below 128TB ?

-aneesh



[PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits

2017-03-12 Thread Kirill A. Shutemov
On x86, 5-level paging enables 56-bit userspace virtual address space.
Not all user space is ready to handle wide addresses. It's known that
at least some JIT compilers use higher bits in pointers to encode their
information. It collides with valid pointers with 5-level paging and
leads to crashes.

To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.

But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.

If hint address set above 47-bit, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 47-bit window.

This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.

One important case we need to handle here is interaction with MPX.
MPX (without MAWA( extension cannot handle addresses above 47-bit, so we
need to make sure that MPX cannot be enabled we already have VMA above
the boundary and forbid creating such VMAs once MPX is enabled.

Signed-off-by: Kirill A. Shutemov 
---
 arch/x86/include/asm/elf.h   |  2 +-
 arch/x86/include/asm/mpx.h   |  9 +
 arch/x86/include/asm/processor.h |  9 ++---
 arch/x86/kernel/sys_x86_64.c | 28 +++-
 arch/x86/mm/hugetlbpage.c| 31 +++
 arch/x86/mm/mmap.c   |  4 ++--
 arch/x86/mm/mpx.c| 33 -
 7 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9d49c18b5ea9..265625b0d6cb 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -250,7 +250,7 @@ extern int force_personality32;
the loader.  We need to make sure that it is out of the way of the program
that it will "exec", and that there is sufficient room for the brk.  */
 
-#define ELF_ET_DYN_BASE(TASK_SIZE / 3 * 2)
+#define ELF_ET_DYN_BASE(DEFAULT_MAP_WINDOW / 3 * 2)
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports.  This could be done in user space,
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index a0d662be4c5b..7d7404756bb4 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm)
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
  unsigned long start, unsigned long end);
+
+unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
+   unsigned long flags);
 #else
 static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 {
@@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm,
unsigned long start, unsigned long end)
 {
 }
+
+static inline unsigned long mpx_unmapped_area_check(unsigned long addr,
+   unsigned long len, unsigned long flags)
+{
+   return addr;
+}
 #endif /* CONFIG_X86_INTEL_MPX */
 
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f385eca5407a..da8ab4f2d0c7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -799,6 +799,7 @@ static inline void spin_lock_prefetch(const void *x)
  */
 #define TASK_SIZE  PAGE_OFFSET
 #define TASK_SIZE_MAX  TASK_SIZE
+#define DEFAULT_MAP_WINDOW TASK_SIZE
 #define STACK_TOP  TASK_SIZE
 #define STACK_TOP_MAX  STACK_TOP
 
@@ -838,7 +839,9 @@ static inline void spin_lock_prefetch(const void *x)
  * particular problem by preventing anything from being mapped
  * at the maximum canonical address.
  */
-#define TASK_SIZE_MAX  ((1UL << 47) - PAGE_SIZE)
+#define TASK_SIZE_MAX  ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
+
+#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
 
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
@@ -851,7 +854,7 @@ static inline void spin_lock_prefetch(const void *x)
 #define TASK_SIZE_OF(child)((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 
-#define STACK_TOP  TASK_SIZE
+#define STACK_TOP  DEFAULT_MAP_WINDOW
 #define STACK_TOP_MAX  TASK_SIZE_MAX
 
 #define INIT_THREAD  { \
@@ -873,7 +876,7 @@ extern void start_thread(struct pt_regs *regs, unsigned 
long new_ip,
  * This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */
-#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+#define TASK_UNMAPPED_BASE (PAGE_ALIGN(DEFAULT_MAP_WINDOW /