Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-23 Thread Segher Boessenkool
On Mon, Jan 22, 2018 at 08:52:53AM +0100, Christophe LEROY wrote:
> >Just make sure to declare all functions, or define it to some empty
> >thing, or #ifdeffery if you have to.  There are many options, it is
> >not hard, and if it means you have to pull code further apart that is
> >not so bad: you get cleaner, clearer code.
> 
> Ok, if I understand well, your comment applies to the following indeed, 
> so you confirm the #ifdef is necessary.

As I said, not necessary, but it might be the easiest or even the
cleanest here.  Something for you and the maintainers to fight about,
I'll stay out of it :-)

> However, my question was related to another part of the current 
> patchset, where the functions are always refined:
> 
> 
> On PPC32 we set:
> 
> +#define SLICE_LOW_SHIFT  28
> +#define SLICE_HIGH_SHIFT 0
> 
> On PPC64 we set:
> 
>  #define SLICE_LOW_SHIFT  28
>  #define SLICE_HIGH_SHIFT 40
> 
> We define:
> 
> +#define slice_bitmap_zero(dst, nbits) \
> + do { if (nbits) bitmap_zero(dst, nbits); } while (0)
> 
> 
> We have a function with:
> {
>   slice_bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
>   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> }

SLICE_NUM_xx is not the same as SLICE_xx_SHIFT; I don't see how any of
those shift values give nbits == 0.

> So the question is to find the better approach. Is the above approach 
> correct, including performance wise ?

If slice_bitmap_zero is inlined (or partially inlined) it is fine.  Is it?


Segher


Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-21 Thread Christophe LEROY



Le 20/01/2018 à 18:56, Segher Boessenkool a écrit :

Hi!

On Sat, Jan 20, 2018 at 09:22:50AM +0100, christophe leroy wrote:

On PPC32, the address space is limited to 4Gbytes, hence only the
low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.


It should work to define SLICE_LOW_TOP as 0x1ull and keep
everything else the same, no?


great, yes it works indeed.




I don't think so. When I had the missing prototype, the compilation goes
ok, including the final link. Which means at the end the code is not
included since radix_enabled() evaluates to 0.

Many many parts of the kernel are based on this assumption.


Segher, what is your opinion on the above ? Can we consider that a ' if
(nbits)' will always be compiled out when nbits is a #define constant,
or should we duplicate the macros as suggested in order to avoid
unneccessary 'if' test on platforms where 'nbits' is always not null by
definition ?


Doing things like

if (nbits)
some_undeclared_function();

will likely work in practice if the condition evaluates to false at
compile time, but a) it will warn; b) it is just yuck; and c) it will
not always work (for example, you get the wrong prototype in this case,
not lethal here with most ABIs, but ugh).

Just make sure to declare all functions, or define it to some empty
thing, or #ifdeffery if you have to.  There are many options, it is
not hard, and if it means you have to pull code further apart that is
not so bad: you get cleaner, clearer code.


Ok, if I understand well, your comment applies to the following indeed, 
so you confirm the #ifdef is necessary.


--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -553,9 +553,11 @@ unsigned long hugetlb_get_unmapped_area(struct file 
*file, unsigned long addr,

struct hstate *hstate = hstate_file(file);
int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));

+#ifdef CONFIG_PPC_RADIX_MMU
if (radix_enabled())
return radix__hugetlb_get_unmapped_area(file, addr, len,
   pgoff, flags);
+#endif
return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
 }
 #endif


However, my question was related to another part of the current 
patchset, where the functions are always refined:



On PPC32 we set:

+#define SLICE_LOW_SHIFT28
+#define SLICE_HIGH_SHIFT   0

On PPC64 we set:

 #define SLICE_LOW_SHIFT28
 #define SLICE_HIGH_SHIFT   40

We define:

+#define slice_bitmap_zero(dst, nbits) \
+   do { if (nbits) bitmap_zero(dst, nbits); } while (0)


We have a function with:
{
slice_bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
}

So the question is to find the better approach. Is the above approach 
correct, including performance wise ?
Or should we define two sets of the macro slice_bitmap_zero(), one for 
CONFIG_PPC32 with the 'if (nbits)' test and one for CONFIG_PPC64 without 
the unnecessary test ?


Or should we avoid this macro entirely and instead do something like:

{
bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
#if SLICE_NUM_HIGH != 0
bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
#endif
}

And if we say the 'macro' approach is OK, should it be better the use a 
static inline function instead ?


Thanks,
Christophe


Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-20 Thread Segher Boessenkool
Hi!

On Sat, Jan 20, 2018 at 09:22:50AM +0100, christophe leroy wrote:
> >>>On PPC32, the address space is limited to 4Gbytes, hence only the 
> >>>low
> >>>slices will be used. As of today, the code uses
> >>>SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
> >>>if addr refers to low or high space.
> >>>On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
> >>>0x1ul degrades to 0. Therefore, the patch modifies
> >>>SLICE_LOW_TOP to (0xul) and modifies the tests to
> >>>(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
> >>>as addr has type 'unsigned long' while not modifying the PPC64
> >>>behaviour.

It should work to define SLICE_LOW_TOP as 0x1ull and keep
everything else the same, no?

> >I don't think so. When I had the missing prototype, the compilation goes 
> >ok, including the final link. Which means at the end the code is not 
> >included since radix_enabled() evaluates to 0.
> >
> >Many many parts of the kernel are based on this assumption.
> 
> Segher, what is your opinion on the above ? Can we consider that a ' if 
> (nbits)' will always be compiled out when nbits is a #define constant, 
> or should we duplicate the macros as suggested in order to avoid 
> unneccessary 'if' test on platforms where 'nbits' is always not null by 
> definition ?

Doing things like

if (nbits)
some_undeclared_function();

will likely work in practice if the condition evaluates to false at
compile time, but a) it will warn; b) it is just yuck; and c) it will
not always work (for example, you get the wrong prototype in this case,
not lethal here with most ABIs, but ugh).

Just make sure to declare all functions, or define it to some empty
thing, or #ifdeffery if you have to.  There are many options, it is
not hard, and if it means you have to pull code further apart that is
not so bad: you get cleaner, clearer code.


Segher


Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-20 Thread christophe leroy

Hi Segher,

Le 19/01/2018 à 10:45, Christophe LEROY a écrit :



Le 19/01/2018 à 10:13, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:37 PM, Christophe LEROY wrote:



Le 19/01/2018 à 10:02, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:14 PM, Christophe LEROY wrote:



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the 
low

slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to 
page.h


The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added 
slice_bitmap_xxx() macros.


  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 


  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h 
b/arch/powerpc/include/asm/page.h

index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
+#ifdef CONFIG_PPC_MM_SLICES
+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,

+  unsigned long flags, unsigned int psize,
+  int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long 
addr);

+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int 
psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long 
start,

+   unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good 
idea.




arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so 
that we

can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 
we need to keep the test due to the difference between low and high 
slices.


the other advantage is we move the SLICE_LOW_SHIFT to the right 
location. IMHO mm subystem is really complex with these really 
overloaded headers. If we can keep it  seperate we should with 
minimal code duplication?


For the constants I fully agree with your proposal and I will do it. 
I was only questionning the benefit of moving the slice_bitmap_() 
stuff, taking into account that the 'if (nbits)' test is already 
eliminated by the compiler.




That is compiler dependent as you are finding with the other patch 
where if (0) didn't get compiled out


I don't think so. When I had the missing prototype, the compilation goes 
ok, including the final link. Which means at the end the code is not 
included since radix_enabled() evaluates to 0.


Many many parts of the kernel are based on this assumption.




Segher, what is your opinion on the above ? Can we consider that a ' if 
(nbits)' will always be compiled out when nb

Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Christophe LEROY



Le 19/01/2018 à 10:13, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:37 PM, Christophe LEROY wrote:



Le 19/01/2018 à 10:02, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:14 PM, Christophe LEROY wrote:



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to 
page.h


The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added 
slice_bitmap_xxx() macros.


  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 


  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h 
b/arch/powerpc/include/asm/page.h

index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
+#ifdef CONFIG_PPC_MM_SLICES
+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,

+  unsigned long flags, unsigned int psize,
+  int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long 
addr);

+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long 
start,

+   unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good idea.



arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so 
that we

can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 
we need to keep the test due to the difference between low and high 
slices.


the other advantage is we move the SLICE_LOW_SHIFT to the right 
location. IMHO mm subystem is really complex with these really 
overloaded headers. If we can keep it  seperate we should with 
minimal code duplication?


For the constants I fully agree with your proposal and I will do it. I 
was only questionning the benefit of moving the slice_bitmap_() 
stuff, taking into account that the 'if (nbits)' test is already 
eliminated by the compiler.




That is compiler dependent as you are finding with the other patch where 
if (0) didn't get compiled out


I don't think so. When I had the missing prototype, the compilation goes 
ok, including the final link. Which means at the end the code is not 
included since radix_enabled() evaluates to 0.


Many many parts of the kernel are based on this assumption.

Christophe



-aneesh


Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Aneesh Kumar K.V



On 01/19/2018 02:37 PM, Christophe LEROY wrote:



Le 19/01/2018 à 10:02, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:14 PM, Christophe LEROY wrote:



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to page.h

The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added 
slice_bitmap_xxx() macros.


  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 


  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h 
b/arch/powerpc/include/asm/page.h

index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
+#ifdef CONFIG_PPC_MM_SLICES
+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, unsigned 
long len,

+  unsigned long flags, unsigned int psize,
+  int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long 
addr);

+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
+   unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good idea.



arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so that we
can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 we 
need to keep the test due to the difference between low and high slices.


the other advantage is we move the SLICE_LOW_SHIFT to the right 
location. IMHO mm subystem is really complex with these really 
overloaded headers. If we can keep it  seperate we should with minimal 
code duplication?


For the constants I fully agree with your proposal and I will do it. I 
was only questionning the benefit of moving the slice_bitmap_() 
stuff, taking into account that the 'if (nbits)' test is already 
eliminated by the compiler.




That is compiler dependent as you are finding with the other patch where 
if (0) didn't get compiled out


-aneesh



Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Christophe LEROY



Le 19/01/2018 à 10:02, Aneesh Kumar K.V a écrit :



On 01/19/2018 02:14 PM, Christophe LEROY wrote:



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to page.h

The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added 
slice_bitmap_xxx() macros.


  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 


  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h 
b/arch/powerpc/include/asm/page.h

index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
+#ifdef CONFIG_PPC_MM_SLICES
+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, unsigned 
long len,

+  unsigned long flags, unsigned int psize,
+  int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long 
addr);

+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
+   unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good idea.



arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so that we
can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 we 
need to keep the test due to the difference between low and high slices.


the other advantage is we move the SLICE_LOW_SHIFT to the right 
location. IMHO mm subystem is really complex with these really 
overloaded headers. If we can keep it  seperate we should with minimal 
code duplication?


For the constants I fully agree with your proposal and I will do it. I 
was only questionning the benefit of moving the slice_bitmap_() 
stuff, taking into account that the 'if (nbits)' test is already 
eliminated by the compiler.


Christophe



In any case, as the nbits we use in slice.c is a constant, the test is 
eliminated at compilation, so I can't see the benefit of making 


-aneesh


Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Aneesh Kumar K.V



On 01/19/2018 02:14 PM, Christophe LEROY wrote:



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to page.h

The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added 
slice_bitmap_xxx() macros.


  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 


  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h 
b/arch/powerpc/include/asm/page.h

index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
+#ifdef CONFIG_PPC_MM_SLICES
+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, unsigned 
long len,

+  unsigned long flags, unsigned int psize,
+  int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr);
+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
+   unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good idea.



arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so that we
can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 we 
need to keep the test due to the difference between low and high slices.


the other advantage is we move the SLICE_LOW_SHIFT to the right 
location. IMHO mm subystem is really complex with these really 
overloaded headers. If we can keep it  seperate we should with minimal 
code duplication?


In any case, as the nbits we use in slice.c is a constant, the test is 
eliminated at compilation, so I can't see the benefit of making 


-aneesh



Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Christophe LEROY



Le 19/01/2018 à 09:24, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x1ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xul) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

This patch moves "slices" functions prototypes from page64.h to page.h

The high slices use bitmaps. As bitmap functions are not prepared to
handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
slice_bitmap_xxx() macros which will take care of the 0 nbits case.

Signed-off-by: Christophe Leroy 
---
  v2: First patch of v1 serie split in two parts ; added slice_bitmap_xxx() 
macros.

  arch/powerpc/include/asm/page.h  | 14 +
  arch/powerpc/include/asm/page_32.h   | 19 
  arch/powerpc/include/asm/page_64.h   | 21 ++---
  arch/powerpc/mm/hash_utils_64.c  |  2 +-
  arch/powerpc/mm/mmu_context_nohash.c |  7 +
  arch/powerpc/mm/slice.c  | 60 
  6 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 8da5d4c1cab2..d0384f9db9eb 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
  #endif
  #endif
  
+#ifdef CONFIG_PPC_MM_SLICES

+struct mm_struct;
+
+unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
+ unsigned long flags, unsigned int psize,
+ int topdown);
+
+unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr);
+
+void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
+void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
+  unsigned long len, unsigned int psize);
+#endif
+


Should we do a slice.h ? the way we have other files? and then do


Yes we could add a slice.h instead of using page.h for that, good idea.



arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so that we
can do type checking.


Is it really worth duplicating that just for eliminating the 'if 
(nbits)' in one case ?


Only in book3s/64 we will be able to eliminate that, for nohash/32 we 
need to keep the test due to the difference between low and high slices.


In any case, as the nbits we use in slice.c is a constant, the test is 
eliminated at compilation, so I can't see the benefit of making 
different slice_bitmap_() based on platform.


Christophe



also related definitions for
#define SLICE_LOW_SHIFT 28
#define SLICE_HIGH_SHIFT0

#define SLICE_LOW_TOP   (0xul)
#define SLICE_NUM_LOW   ((SLICE_LOW_TOP >> SLICE_LOW_SHIFT) + 1)
+#define SLICE_NUM_HIGH 0ul

Common stuff between 64 and 32 can got to
arch/powerpc/include/asm/slice.h ?

It also gives an indication of which 32 bit version we are looking at
here. IIUC 8xx will got to arch/powerpc/include/asm/nohash/32/slice.h?


  #include 
  #endif /* __ASSEMBLY__ */
  
diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h

index 5c378e9b78c8..f7d1bd1183c8 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -60,4 +60,23 @@ extern void copy_page(void *to, void *from);
  
  #endif /* __ASSEMBLY__ */
  
+#ifdef CONFIG_PPC_MM_SLICES

+
+#define SLICE_LOW_SHIFT28
+#define SLICE_HIGH_SHIFT   0
+
+#define SLICE_LOW_TOP   

Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-19 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> In preparation for the following patch which will fix an issue on
> the 8xx by re-using the 'slices', this patch enhances the
> 'slices' implementation to support 32 bits CPUs.
>
> On PPC32, the address space is limited to 4Gbytes, hence only the low
> slices will be used. As of today, the code uses
> SLICE_LOW_TOP (0x1ul) and compares it with addr to determine
> if addr refers to low or high space.
> On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
> 0x1ul degrades to 0. Therefore, the patch modifies
> SLICE_LOW_TOP to (0xul) and modifies the tests to
> (addr <= SLICE_LOW_TOP) which will then always be true on PPC32
> as addr has type 'unsigned long' while not modifying the PPC64
> behaviour.
>
> This patch moves "slices" functions prototypes from page64.h to page.h
>
> The high slices use bitmaps. As bitmap functions are not prepared to
> handling bitmaps of size 0, the bitmap_xxx() calls are wrapped into
> slice_bitmap_xxx() macros which will take care of the 0 nbits case.
>
> Signed-off-by: Christophe Leroy 
> ---
>  v2: First patch of v1 serie split in two parts ; added slice_bitmap_xxx() 
> macros.
>
>  arch/powerpc/include/asm/page.h  | 14 +
>  arch/powerpc/include/asm/page_32.h   | 19 
>  arch/powerpc/include/asm/page_64.h   | 21 ++---
>  arch/powerpc/mm/hash_utils_64.c  |  2 +-
>  arch/powerpc/mm/mmu_context_nohash.c |  7 +
>  arch/powerpc/mm/slice.c  | 60 
> 
>  6 files changed, 83 insertions(+), 40 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 8da5d4c1cab2..d0384f9db9eb 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -342,6 +342,20 @@ typedef struct page *pgtable_t;
>  #endif
>  #endif
>  
> +#ifdef CONFIG_PPC_MM_SLICES
> +struct mm_struct;
> +
> +unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> +   unsigned long flags, unsigned int psize,
> +   int topdown);
> +
> +unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr);
> +
> +void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
> +void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
> +unsigned long len, unsigned int psize);
> +#endif
> +

Should we do a slice.h ? the way we have other files? and then do

arch/powerpc/include/asm/book3s/64/slice.h that will carry
#define slice_bitmap_zero(dst, nbits) \
do { if (nbits) bitmap_zero(dst, nbits); } while (0)
#define slice_bitmap_set(dst, pos, nbits) \
do { if (nbits) bitmap_set(dst, pos, nbits); } while (0)
#define slice_bitmap_copy(dst, src, nbits) \
do { if (nbits) bitmap_copy(dst, src, nbits); } while (0)
#define slice_bitmap_and(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_and(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_or(dst, src1, src2, nbits) \
do { if (nbits) bitmap_or(dst, src1, src2, nbits); } while (0)
#define slice_bitmap_andnot(dst, src1, src2, nbits) \
({ (nbits) ? bitmap_andnot(dst, src1, src2, nbits) : 0; })
#define slice_bitmap_equal(src1, src2, nbits) \
({ (nbits) ? bitmap_equal(src1, src2, nbits) : 1; })
#define slice_bitmap_empty(src, nbits) \
({ (nbits) ? bitmap_empty(src, nbits) : 1; })

This without that if(nbits) check and a proper static inline so that we
can do type checking.

also related definitions for
#define SLICE_LOW_SHIFT 28
#define SLICE_HIGH_SHIFT0

#define SLICE_LOW_TOP   (0xul)
#define SLICE_NUM_LOW   ((SLICE_LOW_TOP >> SLICE_LOW_SHIFT) + 1)
+#define SLICE_NUM_HIGH 0ul

Common stuff between 64 and 32 can got to
arch/powerpc/include/asm/slice.h ?

It also gives an indication of which 32 bit version we are looking at
here. IIUC 8xx will got to arch/powerpc/include/asm/nohash/32/slice.h?

>  #include 
>  #endif /* __ASSEMBLY__ */
>  
> diff --git a/arch/powerpc/include/asm/page_32.h 
> b/arch/powerpc/include/asm/page_32.h
> index 5c378e9b78c8..f7d1bd1183c8 100644
> --- a/arch/powerpc/include/asm/page_32.h
> +++ b/arch/powerpc/include/asm/page_32.h
> @@ -60,4 +60,23 @@ extern void copy_page(void *to, void *from);
>  
>  #endif /* __ASSEMBLY__ */
>  
> +#ifdef CONFIG_PPC_MM_SLICES
> +
> +#define SLICE_LOW_SHIFT  28
> +#define SLICE_HIGH_SHIFT 0
> +
> +#define SLICE_LOW_TOP(0xul)
> +#define SLICE_NUM_LOW((SLICE_LOW_TOP >> SLICE_LOW_SHIFT) + 1)
> +#define SLICE_NUM_HIGH   0ul
> +
> +#define GET_LOW_SLICE_INDEX(addr)((addr) >> SLICE_LOW_SHIFT)
> +#define GET_HIGH_SLICE_INDEX(addr)   (addr & 0)
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> +#endif
> +#define HAVE_ARCH_UNMAPPED_AREA
> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
> +
> +#endif
>