On Tue, May 31, 2016 at 04:29:42PM +0530, Aneesh Kumar K.V wrote:
> This switch few of the page table accessor to use the __raw variant
> and does the cpu to big endian conversion of constants. This helps in
> generating better code.
> 
> For ex: a pgd_none(pgd) check with and without fix is listed below
> 
> Without fix:
> ------------
>    2240:      20 00 61 eb     ld      r27,32(r1)
> /* PGD level */
> typedef struct { __be64 pgd; } pgd_t;
> static inline unsigned long pgd_val(pgd_t x)
> {
>       return be64_to_cpu(x.pgd);
> 
>     2244:     22 00 66 78     rldicl  r6,r3,32,32
>     2248:     3e 40 7d 54     rotlwi  r29,r3,8
>     224c:     0e c0 7d 50     rlwimi  r29,r3,24,0,7
>     2250:     3e 40 c5 54     rotlwi  r5,r6,8
>     2254:     2e c4 7d 50     rlwimi  r29,r3,24,16,23
>     2258:     0e c0 c5 50     rlwimi  r5,r6,24,0,7
>     225c:     2e c4 c5 50     rlwimi  r5,r6,24,16,23
>     2260:     c6 07 bd 7b     rldicr  r29,r29,32,31
>     2264:     78 2b bd 7f     or      r29,r29,r5
>               if (pgd_none(pgd))
>     2268:     00 00 bd 2f     cmpdi   cr7,r29,0
>     226c:     54 03 9e 41     beq     cr7,25c0 <__get_user_pages_fast+0x500>
> 
> With fix:
> ---------
>     2370:     20 00 61 eb     ld      r27,32(r1)
>               if (pgd_none(pgd))
>     2374:     00 00 bd 2f     cmpdi   cr7,r29,0
>     2378:     a8 03 9e 41     beq     cr7,2720 <__get_user_pages_fast+0x530>
>                       break;
> Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable-4k.h  |  6 +-
>  arch/powerpc/include/asm/book3s/64/pgtable-64k.h |  6 +-
>  arch/powerpc/include/asm/book3s/64/pgtable.h     | 99 
> +++++++++++++++++-------
>  arch/powerpc/include/asm/pgtable-be-types.h      | 15 ++++
>  4 files changed, 91 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> index 71e9abced493..9db83b4e017d 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> @@ -11,7 +11,7 @@ static inline int pmd_huge(pmd_t pmd)
>        * leaf pte for huge page
>        */
>       if (radix_enabled())
> -             return !!(pmd_val(pmd) & _PAGE_PTE);
> +             return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>       return 0;
>  }
>  
> @@ -21,7 +21,7 @@ static inline int pud_huge(pud_t pud)
>        * leaf pte for huge page
>        */
>       if (radix_enabled())
> -             return !!(pud_val(pud) & _PAGE_PTE);
> +             return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>       return 0;
>  }
>  
> @@ -31,7 +31,7 @@ static inline int pgd_huge(pgd_t pgd)
>        * leaf pte for huge page
>        */
>       if (radix_enabled())
> -             return !!(pgd_val(pgd) & _PAGE_PTE);
> +             return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));

pgd_raw() will not do the endian swapping.
But instead cpu_to_be64(_PAGE_PTE) will now do the endian swapping. So does it
really optimize anything? i tend to think it just moves the endian
swapping overhead from one place to the other. no?
Is cpu_to_be64(constant) faster than cpu_to_be64(variable)  ?

RP

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to