Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-03-01 Thread Nicholas Piggin
On Thu, 1 Mar 2018 08:09:55 +0100
Christophe LEROY  wrote:

> Le 28/02/2018 à 07:53, Nicholas Piggin a écrit :
> > On Tue, 27 Feb 2018 18:11:07 +0530
> > "Aneesh Kumar K.V"  wrote:
> >   
> >> Nicholas Piggin  writes:
> >>  
> >>> On Tue, 27 Feb 2018 14:31:07 +0530
> >>> "Aneesh Kumar K.V"  wrote:
> >>> 
>  Christophe Leroy  writes:
>   
> > The number of high slices a process might use now depends on its
> > address space size, and what allocation address it has requested.
> >
> > This patch uses that limit throughout call chains where possible,
> > rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> > This saves some cost for processes that don't use very large address
> > spaces.  
> 
>  I haven't really looked at the final code. One of the issue we had was
>  with the below scenario.
> 
>  mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
>  sure we build the mask such that we don't find the addr available.  
> >>>
> >>> We should run it through the mmap regression tests. I *think* we moved
> >>> all of that logic from the slice code to get_ummapped_area before going
> >>> in to slices. I may have missed something though, it would be good to
> >>> have more eyes on it.
> >>> 
> >>
> >> mmap(-1,...) failed with the test. Something like below fix it
> >>
> >> @@ -756,7 +770,7 @@ void slice_set_user_psize(struct mm_struct *mm, 
> >> unsigned int psize)
> >>  mm->context.low_slices_psize = lpsizes;
> >>   
> >>  hpsizes = mm->context.high_slices_psize;
> >> -   high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> >> +   high_slices = SLICE_NUM_HIGH;
> >>  for (i = 0; i < high_slices; i++) {
> >>  mask_index = i & 0x1;
> >>  index = i >> 1;
> >>
> >> I guess for everything in the mm_context_t, we should compute it till
> >> SLICE_NUM_HIGH. The reason for failure was, even though we recompute the
> >> slice mask cached in mm_context on slb_addr_limit, it was still derived
> >> from the high_slices_psizes which was computed with lower value.  
> > 
> > Okay thanks for catching that Aneesh. I guess that's a slow path so it
> > should be okay. Christophe if you're taking care of the series can you
> > fold it in? Otherwise I'll do that after yours gets merged.
> >   
> 
> I'm not really taking care of your serie, just took it once to see how 
> it fits on the 8xx.
> I prefer if you can handle them. If you need my help for any test on 
> PPC32 don't hesitate to ask.

No problem I can do that. Thanks for rebasing them.

Thanks,
Nick


Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-28 Thread Christophe LEROY



Le 28/02/2018 à 07:53, Nicholas Piggin a écrit :

On Tue, 27 Feb 2018 18:11:07 +0530
"Aneesh Kumar K.V"  wrote:


Nicholas Piggin  writes:


On Tue, 27 Feb 2018 14:31:07 +0530
"Aneesh Kumar K.V"  wrote:
  

Christophe Leroy  writes:
   

The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.


I haven't really looked at the final code. One of the issue we had was
with the below scenario.

mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
sure we build the mask such that we don't find the addr available.


We should run it through the mmap regression tests. I *think* we moved
all of that logic from the slice code to get_ummapped_area before going
in to slices. I may have missed something though, it would be good to
have more eyes on it.
  


mmap(-1,...) failed with the test. Something like below fix it

@@ -756,7 +770,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
 mm->context.low_slices_psize = lpsizes;
  
 hpsizes = mm->context.high_slices_psize;

-   high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+   high_slices = SLICE_NUM_HIGH;
 for (i = 0; i < high_slices; i++) {
 mask_index = i & 0x1;
 index = i >> 1;

I guess for everything in the mm_context_t, we should compute it till
SLICE_NUM_HIGH. The reason for failure was, even though we recompute the
slice mask cached in mm_context on slb_addr_limit, it was still derived
from the high_slices_psizes which was computed with lower value.


Okay thanks for catching that Aneesh. I guess that's a slow path so it
should be okay. Christophe if you're taking care of the series can you
fold it in? Otherwise I'll do that after yours gets merged.



I'm not really taking care of your serie, just took it once to see how 
it fits on the 8xx.
I prefer if you can handle them. If you need my help for any test on 
PPC32 don't hesitate to ask.


Christophe


Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Aneesh Kumar K.V

I also noticed that the slice mask printing use wrong variables now. I
guess this should take care of it

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index fef3f36b0b73..6b3575c39668 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -535,8 +535,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 * already
 */
maskp = slice_mask_for_size(mm, psize);
-   slice_print_mask(" good_mask", &good_mask);
-
/*
 * Here "good" means slices that are already the right page size,
 * "compat" means slices that have a compatible page size (i.e.
@@ -569,6 +567,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
{
slice_copy_mask(&good_mask, maskp, high_slices);
}
+   slice_print_mask(" good_mask", &good_mask);
+   slice_print_mask(" compat_mask", compat_maskp);
 
/* First check hint if it's valid or if we have MAP_FIXED */
if (addr || fixed) {
@@ -646,7 +646,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 
slice_range_to_mask(addr, len, &potential_mask, high_slices);
slice_dbg(" found potential area at 0x%lx\n", addr);
-   slice_print_mask(" mask", maskp);
+   slice_print_mask(" mask", &potential_mask);
 
  convert:
slice_andnot_mask(&potential_mask, &potential_mask, &good_mask, 
high_slices);
@@ -836,13 +836,6 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned 
long addr,
return !slice_check_range_fits(mm, &available, addr, len);
}
 #endif
-
-#if 0 /* too verbose */
-   slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
-mm, addr, len);
-   slice_print_mask(" mask", &mask);
-   slice_print_mask(" available", &available);
-#endif
return !slice_check_range_fits(mm, maskp, addr, len);
 }
 #endif



Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Aneesh Kumar K.V



On 02/28/2018 12:23 PM, Nicholas Piggin wrote:

On Tue, 27 Feb 2018 18:11:07 +0530
"Aneesh Kumar K.V"  wrote:


Nicholas Piggin  writes:


On Tue, 27 Feb 2018 14:31:07 +0530
"Aneesh Kumar K.V"  wrote:
  

Christophe Leroy  writes:
   

The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.


I haven't really looked at the final code. One of the issue we had was
with the below scenario.

mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
sure we build the mask such that we don't find the addr available.


We should run it through the mmap regression tests. I *think* we moved
all of that logic from the slice code to get_ummapped_area before going
in to slices. I may have missed something though, it would be good to
have more eyes on it.
  


mmap(-1,...) failed with the test. Something like below fix it

@@ -756,7 +770,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
 mm->context.low_slices_psize = lpsizes;
  
 hpsizes = mm->context.high_slices_psize;

-   high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+   high_slices = SLICE_NUM_HIGH;
 for (i = 0; i < high_slices; i++) {
 mask_index = i & 0x1;
 index = i >> 1;

I guess for everything in the mm_context_t, we should compute it till
SLICE_NUM_HIGH. The reason for failure was, even though we recompute the
slice mask cached in mm_context on slb_addr_limit, it was still derived
from the high_slices_psizes which was computed with lower value.


Okay thanks for catching that Aneesh. I guess that's a slow path so it
should be okay. Christophe if you're taking care of the series can you
fold it in? Otherwise I'll do that after yours gets merged.



should we also compute the mm_context_t.slice_mask using SLICE_NUM_HIGH 
and skip the recalc_slice_mask_cache when we change the addr limit?


-aneesh



Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Nicholas Piggin
On Tue, 27 Feb 2018 18:11:07 +0530
"Aneesh Kumar K.V"  wrote:

> Nicholas Piggin  writes:
> 
> > On Tue, 27 Feb 2018 14:31:07 +0530
> > "Aneesh Kumar K.V"  wrote:
> >  
> >> Christophe Leroy  writes:
> >>   
> >> > The number of high slices a process might use now depends on its
> >> > address space size, and what allocation address it has requested.
> >> >
> >> > This patch uses that limit throughout call chains where possible,
> >> > rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> >> > This saves some cost for processes that don't use very large address
> >> > spaces.
> >> 
> >> I haven't really looked at the final code. One of the issue we had was
> >> with the below scenario.
> >> 
> >> mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
> >> sure we build the mask such that we don't find the addr available.  
> >
> > We should run it through the mmap regression tests. I *think* we moved
> > all of that logic from the slice code to get_ummapped_area before going
> > in to slices. I may have missed something though, it would be good to
> > have more eyes on it.
> >  
> 
> mmap(-1,...) failed with the test. Something like below fix it
> 
> @@ -756,7 +770,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
> int psize)
> mm->context.low_slices_psize = lpsizes;
>  
> hpsizes = mm->context.high_slices_psize;
> -   high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> +   high_slices = SLICE_NUM_HIGH;
> for (i = 0; i < high_slices; i++) {
> mask_index = i & 0x1;
> index = i >> 1;
> 
> I guess for everything in the mm_context_t, we should compute it till
> SLICE_NUM_HIGH. The reason for failure was, even though we recompute the
> slice mask cached in mm_context on slb_addr_limit, it was still derived
> from the high_slices_psizes which was computed with lower value.

Okay thanks for catching that Aneesh. I guess that's a slow path so it
should be okay. Christophe if you're taking care of the series can you
fold it in? Otherwise I'll do that after yours gets merged.

Thanks,
Nick



Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Aneesh Kumar K.V
Nicholas Piggin  writes:

> On Tue, 27 Feb 2018 14:31:07 +0530
> "Aneesh Kumar K.V"  wrote:
>
>> Christophe Leroy  writes:
>> 
>> > The number of high slices a process might use now depends on its
>> > address space size, and what allocation address it has requested.
>> >
>> > This patch uses that limit throughout call chains where possible,
>> > rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
>> > This saves some cost for processes that don't use very large address
>> > spaces.  
>> 
>> I haven't really looked at the final code. One of the issue we had was
>> with the below scenario.
>> 
>> mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
>> sure we build the mask such that we don't find the addr available.
>
> We should run it through the mmap regression tests. I *think* we moved
> all of that logic from the slice code to get_ummapped_area before going
> in to slices. I may have missed something though, it would be good to
> have more eyes on it.
>

mmap(-1,...) failed with the test. Something like below fix it

@@ -756,7 +770,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
mm->context.low_slices_psize = lpsizes;
 
hpsizes = mm->context.high_slices_psize;
-   high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+   high_slices = SLICE_NUM_HIGH;
for (i = 0; i < high_slices; i++) {
mask_index = i & 0x1;
index = i >> 1;

I guess for everything in the mm_context_t, we should compute it till
SLICE_NUM_HIGH. The reason for failure was, even though we recompute the
slice mask cached in mm_context on slb_addr_limit, it was still derived
from the high_slices_psizes which was computed with lower value.

-aneesh



Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Nicholas Piggin
On Tue, 27 Feb 2018 14:31:07 +0530
"Aneesh Kumar K.V"  wrote:

> Christophe Leroy  writes:
> 
> > The number of high slices a process might use now depends on its
> > address space size, and what allocation address it has requested.
> >
> > This patch uses that limit throughout call chains where possible,
> > rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> > This saves some cost for processes that don't use very large address
> > spaces.  
> 
> I haven't really looked at the final code. One of the issue we had was
> with the below scenario.
> 
> mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
> sure we build the mask such that we don't find the addr available.

We should run it through the mmap regression tests. I *think* we moved
all of that logic from the slice code to get_ummapped_area before going
in to slices. I may have missed something though, it would be good to
have more eyes on it.

Thanks,
Nick


Re: [RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-27 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> The number of high slices a process might use now depends on its
> address space size, and what allocation address it has requested.
>
> This patch uses that limit throughout call chains where possible,
> rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> This saves some cost for processes that don't use very large address
> spaces.

I haven't really looked at the final code. One of the issue we had was
with the below scenario.

mmap(addr, len) where addr < 128TB and addr+len > 128TB  We want to make
sure we build the mask such that we don't find the addr available.

-aneesh



[RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-12 Thread Christophe Leroy
The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/slice.c | 111 ++--
 1 file changed, 60 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b8b691369c29..683ff4604ab4 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -61,13 +61,12 @@ static void slice_print_mask(const char *label, const 
struct slice_mask *mask) {
 #endif
 
 static void slice_range_to_mask(unsigned long start, unsigned long len,
-   struct slice_mask *ret)
+   struct slice_mask *ret,
+   unsigned long high_slices)
 {
unsigned long end = start + len - 1;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
if (start < SLICE_LOW_TOP) {
unsigned long mend = min(end,
 (unsigned long)(SLICE_LOW_TOP - 1));
@@ -76,6 +75,7 @@ static void slice_range_to_mask(unsigned long start, unsigned 
long len,
- (1u << GET_LOW_SLICE_INDEX(start));
}
 
+   slice_bitmap_zero(ret->high_slices, high_slices);
if ((start + len) > SLICE_LOW_TOP) {
unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
@@ -119,28 +119,27 @@ static int slice_high_has_vma(struct mm_struct *mm, 
unsigned long slice)
 }
 
 static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
-   unsigned long high_limit)
+   unsigned long high_slices)
 {
unsigned long i;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
for (i = 0; i < SLICE_NUM_LOW; i++)
if (!slice_low_has_vma(mm, i))
ret->low_slices |= 1u << i;
 
-   if (high_limit <= SLICE_LOW_TOP)
+   if (!high_slices)
return;
 
-   for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
+   slice_bitmap_zero(ret->high_slices, high_slices);
+   for (i = 0; i < high_slices; i++)
if (!slice_high_has_vma(mm, i))
__set_bit(i, ret->high_slices);
 }
 
 static void calc_slice_mask_for_size(struct mm_struct *mm, int psize,
struct slice_mask *ret,
-   unsigned long high_limit)
+   unsigned long high_slices)
 {
unsigned char *hpsizes;
int index, mask_index;
@@ -148,18 +147,17 @@ static void calc_slice_mask_for_size(struct mm_struct 
*mm, int psize,
u64 lpsizes;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
lpsizes = mm->context.low_slices_psize;
for (i = 0; i < SLICE_NUM_LOW; i++)
if (((lpsizes >> (i * 4)) & 0xf) == psize)
ret->low_slices |= 1u << i;
 
-   if (high_limit <= SLICE_LOW_TOP)
+   if (!high_slices)
return;
 
+   slice_bitmap_zero(ret->high_slices, high_slices);
hpsizes = mm->context.high_slices_psize;
-   for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
+   for (i = 0; i < high_slices; i++) {
mask_index = i & 0x1;
index = i >> 1;
if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
@@ -168,16 +166,15 @@ static void calc_slice_mask_for_size(struct mm_struct 
*mm, int psize,
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
-static void recalc_slice_mask_cache(struct mm_struct *mm)
+static void recalc_slice_mask_cache(struct mm_struct *mm, unsigned long 
high_slices)
 {
-   unsigned long l = mm->context.slb_addr_limit;
-   calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, 
high_slices);
 #ifdef CONFIG_PPC_64K_PAGES
-   calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, 
high_slices);
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
-   calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, l);
-   calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, 
high_slices);
+   calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, 
high_slices);
 #endif
 }
 
@@ -198,17 +195,16 @@ static const s