Re: Hugepages mixed with stacks in process address space

2018-09-04 Thread Andrew Morton
(cc linux-mm).

And thanks.

On Tue, 04 Sep 2018 17:08:34 +0200 Jacek Tomaka  wrote:

> Hello, 
> 
> I was trying to track down the performance differences of one of my 
> applications 
> between running it on kernel used in Centos 7.4 and the latest 4.x version. 
> On 4.x kernels its performance depended on the run and the variability 
> was more than 30%. 
> 
> Bisecting showed that my issue  was introduced by : 
> fd8526ad14c182605e42b64646344b95befd9f94 :x86/mm: Implement ASLR for 
> hugetlb mappings
> 
> But it was not the ASLR aspect of that commit that created the issue but the 
> change from bottom-up to top-down unmapped area lookup when allocating 
> huge pages. 
> 
> After that change, the huge page allocations could become intertwined with 
> stacks. Before, the stacks and huge pages were on the other side of the 
> process 
> address space. 
> 
> The machine i am seeing it on is Knights Landing 7250, with 68 cores x 4 
> hyper-threads. 
> 
> My application spawns 272 threads and each thread allocates its memory - a 
> couple of 2MB huge pages and does some computation, dominated by memory 
> accesses. 
> 
> My theory is that because KNL has 8-way 2MB TLB,  when the huge pages are 
> exactly 8 pages apart they collide.  And this is where the variability comes 
> from, 
> if the stacks come in between, they increase chances of them colliding. 
> 
> I do realise that the application is (I am ) doing a few things dubiously:  it
> allocates memory on each thread and each huge page separately.  But i thought
> you might want to know about this behaviour change. 
> 
> When i allocate all my memory before i start threads, the problem goes away. 
> 
> /proc/PID/maps: 
> After change: 
> 7f5e06a0-7f5e06c0 rw-p  00:0f 31809  
> /anon_hugepage (deleted)
> 7f5e06c0-7f5e06e0 rw-p  00:0f 29767  
> /anon_hugepage (deleted)
> 7f5e06e0-7f5e0700 rw-p  00:0f 30787  
> /anon_hugepage (deleted)
> 7f5e0700-7f5e0720 rw-p  00:0f 30786  
> /anon_hugepage (deleted)
> 7f5e0720-7f5e0740 rw-p  00:0f 28744  
> /anon_hugepage (deleted)
> 7f5e075ff000-7f5e0760 ---p  00:00 0 
> 7f5e0760-7f5e07e0 rw-p  00:00 0 
> 7f5e07e0-7f5e0800 rw-p  00:0f 30785  
> /anon_hugepage (deleted)
> 7f5e0800-7f5e08021000 rw-p  00:00 0 
> 7f5e08021000-7f5e0c00 ---p  00:00 0 
> 7f5e0c00-7f5e0c021000 rw-p  00:00 0 
> 7f5e0c021000-7f5e1000 ---p  00:00 0 
> 7f5e1000-7f5e10021000 rw-p  00:00 0 
> 7f5e10021000-7f5e1400 ---p  00:00 0 
> 7f5e1420-7f5e1440 rw-p  00:0f 29765  
> /anon_hugepage (deleted)
> 7f5e1440-7f5e1460 rw-p  00:0f 28743  
> /anon_hugepage (deleted)
> 7f5e1460-7f5e1480 rw-p  00:0f 29764  
> /anon_hugepage (deleted)
> (...)
> 
> Before change: 
> 2ac0-2ae0 rw-p  00:0f 25582  
> /anon_hugepage (deleted)
> 2ae0-2b00 rw-p  00:0f 25583  
> /anon_hugepage (deleted)
> 2b00-2b20 rw-p  00:0f 25584  
> /anon_hugepage (deleted)
> 2b20-2b40 rw-p  00:0f 25585  
> /anon_hugepage (deleted)
> 2b40-2b60 rw-p  00:0f 25601  
> /anon_hugepage (deleted)
> 2b60-2b80 rw-p  00:0f 25599  
> /anon_hugepage (deleted)
> 2b80-2ba0 rw-p  00:0f 25602  
> /anon_hugepage (deleted)
> 2ba0-2bc0 rw-p  00:0f 26652  
> /anon_hugepage (deleted)
> (...)
> 7fc4f0021000-7fc4f400 ---p  00:00 0 
> 7fc4f400-7fc4f4021000 rw-p  00:00 0 
> 7fc4f4021000-7fc4f800 ---p  00:00 0 
> 7fc4f800-7fc4f8021000 rw-p  00:00 0 
> 7fc4f8021000-7fc4fc00 ---p  00:00 0 
> 7fc4fc00-7fc4fc021000 rw-p  00:00 0 
> 7fc4fc021000-7fc5 ---p  00:00 0 
> 7fc5-7fc500021000 rw-p  00:00 0 
> 7fc500021000-7fc50400 ---p  00:00 0 
> 7fc50400-7fc504021000 rw-p  00:00 0 
> 7fc504021000-7fc50800 ---p  00:00 0 
> 7fc50800-7fc508021000 rw-p  00:00 0 
> 7fc508021000-7fc50c00 ---p  00:00 0 
> (...)
> 
> I was wondering if this intertwined stacks and hugepages is an expected 
> feature of ASLR? If not, maybe mmap's MAP_STACK flag could finally start 
> to be used by the kernel to keep all the stacks together in process address 
> space?
> 
> Or should users just not allocate huge pages on separate threads?
> 
> MAP_STACK could also be used to mark a VMA as a mapping for stack, 
> (if there are flags left) 

Hugepages mixed with stacks in process address space

2018-09-04 Thread Jacek Tomaka
Hello, 

I was trying to track down the performance differences of one of my 
applications 
between running it on kernel used in Centos 7.4 and the latest 4.x version. 
On 4.x kernels its performance depended on the run and the variability 
was more than 30%. 

Bisecting showed that my issue  was introduced by : 
fd8526ad14c182605e42b64646344b95befd9f94 :x86/mm: Implement ASLR for 
hugetlb mappings

But it was not the ASLR aspect of that commit that created the issue but the 
change from bottom-up to top-down unmapped area lookup when allocating 
huge pages. 

After that change, the huge page allocations could become intertwined with 
stacks. Before, the stacks and huge pages were on the other side of the process 
address space. 

The machine i am seeing it on is Knights Landing 7250, with 68 cores x 4 
hyper-threads. 

My application spawns 272 threads and each thread allocates its memory - a 
couple of 2MB huge pages and does some computation, dominated by memory 
accesses. 

My theory is that because KNL has 8-way 2MB TLB,  when the huge pages are 
exactly 8 pages apart they collide.  And this is where the variability comes 
from, 
if the stacks come in between, they increase chances of them colliding. 

I do realise that the application is (I am ) doing a few things dubiously:  it
allocates memory on each thread and each huge page separately.  But i thought
you might want to know about this behaviour change. 

When i allocate all my memory before i start threads, the problem goes away. 

/proc/PID/maps: 
After change: 
7f5e06a0-7f5e06c0 rw-p  00:0f 31809  
/anon_hugepage (deleted)
7f5e06c0-7f5e06e0 rw-p  00:0f 29767  
/anon_hugepage (deleted)
7f5e06e0-7f5e0700 rw-p  00:0f 30787  
/anon_hugepage (deleted)
7f5e0700-7f5e0720 rw-p  00:0f 30786  
/anon_hugepage (deleted)
7f5e0720-7f5e0740 rw-p  00:0f 28744  
/anon_hugepage (deleted)
7f5e075ff000-7f5e0760 ---p  00:00 0 
7f5e0760-7f5e07e0 rw-p  00:00 0 
7f5e07e0-7f5e0800 rw-p  00:0f 30785  
/anon_hugepage (deleted)
7f5e0800-7f5e08021000 rw-p  00:00 0 
7f5e08021000-7f5e0c00 ---p  00:00 0 
7f5e0c00-7f5e0c021000 rw-p  00:00 0 
7f5e0c021000-7f5e1000 ---p  00:00 0 
7f5e1000-7f5e10021000 rw-p  00:00 0 
7f5e10021000-7f5e1400 ---p  00:00 0 
7f5e1420-7f5e1440 rw-p  00:0f 29765  
/anon_hugepage (deleted)
7f5e1440-7f5e1460 rw-p  00:0f 28743  
/anon_hugepage (deleted)
7f5e1460-7f5e1480 rw-p  00:0f 29764  
/anon_hugepage (deleted)
(...)

Before change: 
2ac0-2ae0 rw-p  00:0f 25582  
/anon_hugepage (deleted)
2ae0-2b00 rw-p  00:0f 25583  
/anon_hugepage (deleted)
2b00-2b20 rw-p  00:0f 25584  
/anon_hugepage (deleted)
2b20-2b40 rw-p  00:0f 25585  
/anon_hugepage (deleted)
2b40-2b60 rw-p  00:0f 25601  
/anon_hugepage (deleted)
2b60-2b80 rw-p  00:0f 25599  
/anon_hugepage (deleted)
2b80-2ba0 rw-p  00:0f 25602  
/anon_hugepage (deleted)
2ba0-2bc0 rw-p  00:0f 26652  
/anon_hugepage (deleted)
(...)
7fc4f0021000-7fc4f400 ---p  00:00 0 
7fc4f400-7fc4f4021000 rw-p  00:00 0 
7fc4f4021000-7fc4f800 ---p  00:00 0 
7fc4f800-7fc4f8021000 rw-p  00:00 0 
7fc4f8021000-7fc4fc00 ---p  00:00 0 
7fc4fc00-7fc4fc021000 rw-p  00:00 0 
7fc4fc021000-7fc5 ---p  00:00 0 
7fc5-7fc500021000 rw-p  00:00 0 
7fc500021000-7fc50400 ---p  00:00 0 
7fc50400-7fc504021000 rw-p  00:00 0 
7fc504021000-7fc50800 ---p  00:00 0 
7fc50800-7fc508021000 rw-p  00:00 0 
7fc508021000-7fc50c00 ---p  00:00 0 
(...)

I was wondering if this intertwined stacks and hugepages is an expected 
feature of ASLR? If not, maybe mmap's MAP_STACK flag could finally start 
to be used by the kernel to keep all the stacks together in process address 
space?

Or should users just not allocate huge pages on separate threads?

MAP_STACK could also be used to mark a VMA as a mapping for stack, 
(if there are flags left) to re-implement: 
65376df582174ffcec9e6471bf5b0dd79ba05e4a proc: revert /proc//maps 
[stack:TID] annotation
correctly, as having these pieces of information in place would greatly 
simplify my investigation. 

Regards.
Jacek Tomaka