Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Fri, Mar 24, 2017 at 02:44:10PM +0530, Aneesh Kumar K.V wrote: > > > On Friday 24 March 2017 02:34 PM, Kirill A. Shutemov wrote: > > On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote: > > > "Kirill A. Shutemov" writes: > > > @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, > > > const unsigned long addr0, > > > > unsigned long addr = addr0; > > > > struct vm_unmapped_area_info info; > > > > > > > > + addr = mpx_unmapped_area_check(addr, len, flags); > > > > + if (IS_ERR_VALUE(addr)) > > > > + return addr; > > > > + > > > > /* requested length too big for entire address space */ > > > > if (len > TASK_SIZE) > > > > return -ENOMEM; > > > > @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, > > > > const unsigned long addr0, > > > > info.length = len; > > > > info.low_limit = PAGE_SIZE; > > > > info.high_limit = mm->mmap_base; > > > > + > > > > + /* > > > > +* If hint address is above DEFAULT_MAP_WINDOW, look for > > > > unmapped area > > > > +* in the full address space. > > > > +*/ > > > > + if (addr > DEFAULT_MAP_WINDOW) > > > > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > > > + > > > > > > Is this ok for 32 bit application ? > > > > DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will > > be compile out. > > > > That is not about CONFIG_X86_32 but about 32 bit application on a 64 bit > kernel. I guess we will never find addr > DEFAULT_MAP_WINDOW with > a 32 bit app ? I have local change to avoid this within 32-bit syscall, but I'll need to recheck everthing. -- Kirill A. Shutemov
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Friday 24 March 2017 02:34 PM, Kirill A. Shutemov wrote: On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote: "Kirill A. Shutemov" writes: @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, unsigned long addr = addr0; struct vm_unmapped_area_info info; + addr = mpx_unmapped_area_check(addr, len, flags); + if (IS_ERR_VALUE(addr)) + return addr; + /* requested length too big for entire address space */ if (len > TASK_SIZE) return -ENOMEM; @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, info.length = len; info.low_limit = PAGE_SIZE; info.high_limit = mm->mmap_base; + + /* +* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area +* in the full address space. +*/ + if (addr > DEFAULT_MAP_WINDOW) + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; + Is this ok for 32 bit application ? DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will be compile out. That is not about CONFIG_X86_32 but about 32 bit application on a 64 bit kernel. I guess we will never find addr > DEFAULT_MAP_WINDOW with a 32 bit app ? -aneesh
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Mon, Mar 20, 2017 at 10:40:20AM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" writes: > @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const > unsigned long addr0, > > unsigned long addr = addr0; > > struct vm_unmapped_area_info info; > > > > + addr = mpx_unmapped_area_check(addr, len, flags); > > + if (IS_ERR_VALUE(addr)) > > + return addr; > > + > > /* requested length too big for entire address space */ > > if (len > TASK_SIZE) > > return -ENOMEM; > > @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, > > const unsigned long addr0, > > info.length = len; > > info.low_limit = PAGE_SIZE; > > info.high_limit = mm->mmap_base; > > + > > + /* > > +* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > > +* in the full address space. > > +*/ > > + if (addr > DEFAULT_MAP_WINDOW) > > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > > + > > Is this ok for 32 bit application ? DEFAULT_MAP_WINDOW is equal to TASK_SIZE on 32-bit, so it's nop and will be compile out. -- Kirill A. Shutemov
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Sun, Mar 19, 2017 at 02:25:08PM +0530, Aneesh Kumar K.V wrote: > >>> So if I have done a successful mmap which returned > 128TB what should a > >>> following mmap(0,...) return ? Should that now search the *full* address > >>> space or below 128TB ? > >> > >> No, I don't think so. And this implementation doesn't do this. > >> > >> It's safer this way: if an library can't handle high addresses, it's > >> better not to switch it automagically to full address space if other part > >> of the process requested high address. > >> > > > > What is the epectation when the hint addr is below 128TB but addr + len > > > 128TB ? Should such mmap request fail ? > > Considering that we have stack at the top (around 128TB) we may not be > able to get a free area for such a request. But I guess the idea here is > that if hint address is below 128TB, we behave as though our TASK_SIZE > is 128TB ? Is that correct ? Right. -- Kirill A. Shutemov
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Mon, Mar 20, 2017 at 11:08:41AM -0700, h...@zytor.com wrote: > This *better* be conditional on some kind of settable limit. Having a > barrier in the middle of the address space for no apparent reason to > "clean" software is insane. I had the same argument (on your side) before, but if you look on numbers it's far from the middle of address space. The barrier is around 0.2% from the start 56-bit address space. And it's we have vdso/vvar/stack just below the barier anyway. I don't think we would loose much if wouldn't not allow VMA to sit across it. -- Kirill A. Shutemov
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Mon, Mar 20, 2017 at 11:08:41AM -0700, h...@zytor.com wrote: > On March 19, 2017 1:26:58 AM PDT, "Kirill A. Shutemov" > wrote: > >On Mar 19, 2017 09:25, "Aneesh Kumar K.V" > >wrote: > > > What is the epectation when the hint addr is below 128TB but addr + len > > > 128TB ? Should such mmap request fail ? > > > >Yes, I believe so. > > This *better* be conditional on some kind of settable limit. Having a > barrier in the middle of the address space for no apparent reason to > "clean" software is insane. I disagree with Kirill here. If addr+len > 128TB, I think we should assume the application is 57-bit aware. Specifying hint addresses is such a rare thing to do anyway.
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On March 19, 2017 1:26:58 AM PDT, "Kirill A. Shutemov" wrote: >On Mar 19, 2017 09:25, "Aneesh Kumar K.V" > >wrote: > >"Kirill A. Shutemov" writes: > >> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote: >>> "Kirill A. Shutemov" writes: >>> >>> > On x86, 5-level paging enables 56-bit userspace virtual address >space. >>> > Not all user space is ready to handle wide addresses. It's known >that >>> > at least some JIT compilers use higher bits in pointers to encode >their >>> > information. It collides with valid pointers with 5-level paging >and >>> > leads to crashes. >>> > >>> > To mitigate this, we are not going to allocate virtual address >space >>> > above 47-bit by default. >>> > >>> > But userspace can ask for allocation from full address space by >>> > specifying hint address (with or without MAP_FIXED) above 47-bits. >>> > >>> > If hint address set above 47-bit, but MAP_FIXED is not specified, >we >try >>> > to look for unmapped area by specified address. If it's already >>> > occupied, we look for unmapped area in *full* address space, >rather >than >>> > from 47-bit window. >>> > >>> > This approach helps to easily make application's memory allocator >aware >>> > about large address space without manually tracking allocated >virtual >>> > address space. >>> > >>> >>> So if I have done a successful mmap which returned > 128TB what >should a >>> following mmap(0,...) return ? Should that now search the *full* >address >>> space or below 128TB ? >> >> No, I don't think so. And this implementation doesn't do this. >> >> It's safer this way: if an library can't handle high addresses, it's >> better not to switch it automagically to full address space if other >part >> of the process requested high address. >> > >What is the epectation when the hint addr is below 128TB but addr + len >> >128TB ? Should such mmap request fail ? > > >Yes, I believe so. This *better* be conditional on some kind of settable limit. Having a barrier in the middle of the address space for no apparent reason to "clean" software is insane. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
"Aneesh Kumar K.V" writes: > "Kirill A. Shutemov" writes: >> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote: >>> So if I have done a successful mmap which returned > 128TB what should a >>> following mmap(0,...) return ? Should that now search the *full* address >>> space or below 128TB ? >> >> No, I don't think so. And this implementation doesn't do this. >> >> It's safer this way: if an library can't handle high addresses, it's >> better not to switch it automagically to full address space if other part >> of the process requested high address. > > What is the epectation when the hint addr is below 128TB but addr + len > > 128TB ? Should such mmap request fail ? Yeah I think that makes sense, it retains the existing behaviour unless the hint itself is >= 128TB. cheers
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
"Kirill A. Shutemov" writes: @@ -168,6 +182,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > unsigned long addr = addr0; > struct vm_unmapped_area_info info; > > + addr = mpx_unmapped_area_check(addr, len, flags); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > /* requested length too big for entire address space */ > if (len > TASK_SIZE) > return -ENOMEM; > @@ -192,6 +210,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const > unsigned long addr0, > info.length = len; > info.low_limit = PAGE_SIZE; > info.high_limit = mm->mmap_base; > + > + /* > + * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area > + * in the full address space. > + */ > + if (addr > DEFAULT_MAP_WINDOW) > + info.high_limit += TASK_SIZE - DEFAULT_MAP_WINDOW; > + Is this ok for 32 bit application ? > info.align_mask = 0; > info.align_offset = pgoff << PAGE_SHIFT; > if (filp) { -aneesh
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
"Aneesh Kumar K.V" writes: > "Kirill A. Shutemov" writes: > >> On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote: >>> "Kirill A. Shutemov" writes: >>> >>> > On x86, 5-level paging enables 56-bit userspace virtual address space. >>> > Not all user space is ready to handle wide addresses. It's known that >>> > at least some JIT compilers use higher bits in pointers to encode their >>> > information. It collides with valid pointers with 5-level paging and >>> > leads to crashes. >>> > >>> > To mitigate this, we are not going to allocate virtual address space >>> > above 47-bit by default. >>> > >>> > But userspace can ask for allocation from full address space by >>> > specifying hint address (with or without MAP_FIXED) above 47-bits. >>> > >>> > If hint address set above 47-bit, but MAP_FIXED is not specified, we try >>> > to look for unmapped area by specified address. If it's already >>> > occupied, we look for unmapped area in *full* address space, rather than >>> > from 47-bit window. >>> > >>> > This approach helps to easily make application's memory allocator aware >>> > about large address space without manually tracking allocated virtual >>> > address space. >>> > >>> >>> So if I have done a successful mmap which returned > 128TB what should a >>> following mmap(0,...) return ? Should that now search the *full* address >>> space or below 128TB ? >> >> No, I don't think so. And this implementation doesn't do this. >> >> It's safer this way: if an library can't handle high addresses, it's >> better not to switch it automagically to full address space if other part >> of the process requested high address. >> > > What is the epectation when the hint addr is below 128TB but addr + len > > 128TB ? Should such mmap request fail ? Considering that we have stack at the top (around 128TB) we may not be able to get a free area for such a request. But I guess the idea here is that if hint address is below 128TB, we behave as though our TASK_SIZE is 128TB ? Is that correct ? -aneesh
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
"Kirill A. Shutemov" writes: > On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" writes: >> >> > On x86, 5-level paging enables 56-bit userspace virtual address space. >> > Not all user space is ready to handle wide addresses. It's known that >> > at least some JIT compilers use higher bits in pointers to encode their >> > information. It collides with valid pointers with 5-level paging and >> > leads to crashes. >> > >> > To mitigate this, we are not going to allocate virtual address space >> > above 47-bit by default. >> > >> > But userspace can ask for allocation from full address space by >> > specifying hint address (with or without MAP_FIXED) above 47-bits. >> > >> > If hint address set above 47-bit, but MAP_FIXED is not specified, we try >> > to look for unmapped area by specified address. If it's already >> > occupied, we look for unmapped area in *full* address space, rather than >> > from 47-bit window. >> > >> > This approach helps to easily make application's memory allocator aware >> > about large address space without manually tracking allocated virtual >> > address space. >> > >> >> So if I have done a successful mmap which returned > 128TB what should a >> following mmap(0,...) return ? Should that now search the *full* address >> space or below 128TB ? > > No, I don't think so. And this implementation doesn't do this. > > It's safer this way: if an library can't handle high addresses, it's > better not to switch it automagically to full address space if other part > of the process requested high address. > What is the epectation when the hint addr is below 128TB but addr + len > 128TB ? Should such mmap request fail ? -aneesh
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On Fri, Mar 17, 2017 at 11:23:54PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" writes: > > > On x86, 5-level paging enables 56-bit userspace virtual address space. > > Not all user space is ready to handle wide addresses. It's known that > > at least some JIT compilers use higher bits in pointers to encode their > > information. It collides with valid pointers with 5-level paging and > > leads to crashes. > > > > To mitigate this, we are not going to allocate virtual address space > > above 47-bit by default. > > > > But userspace can ask for allocation from full address space by > > specifying hint address (with or without MAP_FIXED) above 47-bits. > > > > If hint address set above 47-bit, but MAP_FIXED is not specified, we try > > to look for unmapped area by specified address. If it's already > > occupied, we look for unmapped area in *full* address space, rather than > > from 47-bit window. > > > > This approach helps to easily make application's memory allocator aware > > about large address space without manually tracking allocated virtual > > address space. > > > > So if I have done a successful mmap which returned > 128TB what should a > following mmap(0,...) return ? Should that now search the *full* address > space or below 128TB ? No, I don't think so. And this implementation doesn't do this. It's safer this way: if an library can't handle high addresses, it's better not to switch it automagically to full address space if other part of the process requested high address. -- Kirill A. Shutemov
Re: [PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
"Kirill A. Shutemov" writes: > On x86, 5-level paging enables 56-bit userspace virtual address space. > Not all user space is ready to handle wide addresses. It's known that > at least some JIT compilers use higher bits in pointers to encode their > information. It collides with valid pointers with 5-level paging and > leads to crashes. > > To mitigate this, we are not going to allocate virtual address space > above 47-bit by default. > > But userspace can ask for allocation from full address space by > specifying hint address (with or without MAP_FIXED) above 47-bits. > > If hint address set above 47-bit, but MAP_FIXED is not specified, we try > to look for unmapped area by specified address. If it's already > occupied, we look for unmapped area in *full* address space, rather than > from 47-bit window. > > This approach helps to easily make application's memory allocator aware > about large address space without manually tracking allocated virtual > address space. > So if I have done a successful mmap which returned > 128TB what should a following mmap(0,...) return ? Should that now search the *full* address space or below 128TB ? -aneesh
[PATCH 26/26] x86/mm: allow to have userspace mappings above 47-bits
On x86, 5-level paging enables 56-bit userspace virtual address space. Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. It collides with valid pointers with 5-level paging and leads to crashes. To mitigate this, we are not going to allocate virtual address space above 47-bit by default. But userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If hint address set above 47-bit, but MAP_FIXED is not specified, we try to look for unmapped area by specified address. If it's already occupied, we look for unmapped area in *full* address space, rather than from 47-bit window. This approach helps to easily make application's memory allocator aware about large address space without manually tracking allocated virtual address space. One important case we need to handle here is interaction with MPX. MPX (without MAWA( extension cannot handle addresses above 47-bit, so we need to make sure that MPX cannot be enabled we already have VMA above the boundary and forbid creating such VMAs once MPX is enabled. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/elf.h | 2 +- arch/x86/include/asm/mpx.h | 9 + arch/x86/include/asm/processor.h | 9 ++--- arch/x86/kernel/sys_x86_64.c | 28 +++- arch/x86/mm/hugetlbpage.c| 31 +++ arch/x86/mm/mmap.c | 4 ++-- arch/x86/mm/mpx.c| 33 - 7 files changed, 104 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 9d49c18b5ea9..265625b0d6cb 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -250,7 +250,7 @@ extern int force_personality32; the loader. We need to make sure that it is out of the way of the program that it will "exec", and that there is sufficient room for the brk. */ -#define ELF_ET_DYN_BASE(TASK_SIZE / 3 * 2) +#define ELF_ET_DYN_BASE(DEFAULT_MAP_WINDOW / 3 * 2) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. This could be done in user space, diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index a0d662be4c5b..7d7404756bb4 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm) } void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); + +unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, + unsigned long flags); #else static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs) { @@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm, unsigned long start, unsigned long end) { } + +static inline unsigned long mpx_unmapped_area_check(unsigned long addr, + unsigned long len, unsigned long flags) +{ + return addr; +} #endif /* CONFIG_X86_INTEL_MPX */ #endif /* _ASM_X86_MPX_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f385eca5407a..da8ab4f2d0c7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -799,6 +799,7 @@ static inline void spin_lock_prefetch(const void *x) */ #define TASK_SIZE PAGE_OFFSET #define TASK_SIZE_MAX TASK_SIZE +#define DEFAULT_MAP_WINDOW TASK_SIZE #define STACK_TOP TASK_SIZE #define STACK_TOP_MAX STACK_TOP @@ -838,7 +839,9 @@ static inline void spin_lock_prefetch(const void *x) * particular problem by preventing anything from being mapped * at the maximum canonical address. */ -#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE) +#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) + +#define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) /* This decides where the kernel will search for a free chunk of vm * space during mmap's. @@ -851,7 +854,7 @@ static inline void spin_lock_prefetch(const void *x) #define TASK_SIZE_OF(child)((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ IA32_PAGE_OFFSET : TASK_SIZE_MAX) -#define STACK_TOP TASK_SIZE +#define STACK_TOP DEFAULT_MAP_WINDOW #define STACK_TOP_MAX TASK_SIZE_MAX #define INIT_THREAD { \ @@ -873,7 +876,7 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, * This decides where the kernel will search for a free chunk of vm * space during mmap's. */ -#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) +#define TASK_UNMAPPED_BASE (PAGE_ALIGN(DEFAULT_MAP_WINDOW /