Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-02-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of February 10, 2021 2:13 am:
> 
> 
> Le 09/02/2021 à 02:55, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
>>> When r3 is not modified, reload it from regs->orig_r3 to free
>>> volatile registers. This avoids a stack frame for the likely part
>>> of system_call_exception()
>> 
>> This doesn't on my 64s build, but it does reduce one non volatile
>> register save/restore. With quite a bit more register pressure
>> reduction 64s can avoid the stack frame as well.
> 
> The stack frame is not due to the registers because on PPC64 you have the 
> redzone that you don't 
> have on PPC32.
> 
> As far as I can see, this is due to a call to .arch_local_irq_restore().
> 
> On ppc32 arch_local_irq_restore() is just a write to MSR.

Oh you're right there. We can actually inline fast paths of that I have 
a patch somewhere, but not sure if it's worthwhile.

>> It's a cool trick but quite code and compiler specific so I don't know
>> how worthwhile it is to keep considering we're calling out into random
>> kernel C code after this.
>> 
>> Maybe just keep it PPC32 specific for the moment, will have to do more
>> tuning for 64 and we have other stuff to do there first.
>> 
>> If you are happy to make it 32-bit only then
> 
> I think we can leave without this, that's only one or two cycles won.

Okay for this round let's drop it for now.

Thanks,
Nick


Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-02-09 Thread Christophe Leroy




Le 09/02/2021 à 02:55, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:

When r3 is not modified, reload it from regs->orig_r3 to free
volatile registers. This avoids a stack frame for the likely part
of system_call_exception()


This doesn't on my 64s build, but it does reduce one non volatile
register save/restore. With quite a bit more register pressure
reduction 64s can avoid the stack frame as well.


The stack frame is not due to the registers because on PPC64 you have the redzone that you don't 
have on PPC32.


As far as I can see, this is due to a call to .arch_local_irq_restore().

On ppc32 arch_local_irq_restore() is just a write to MSR.




It's a cool trick but quite code and compiler specific so I don't know
how worthwhile it is to keep considering we're calling out into random
kernel C code after this.

Maybe just keep it PPC32 specific for the moment, will have to do more
tuning for 64 and we have other stuff to do there first.

If you are happy to make it 32-bit only then


I think we can leave without this, that's only one or two cycles won.



Reviewed-by: Nicholas Piggin 



Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-02-08 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> When r3 is not modified, reload it from regs->orig_r3 to free
> volatile registers. This avoids a stack frame for the likely part
> of system_call_exception()

This doesn't on my 64s build, but it does reduce one non volatile
register save/restore. With quite a bit more register pressure
reduction 64s can avoid the stack frame as well.

It's a cool trick but quite code and compiler specific so I don't know 
how worthwhile it is to keep considering we're calling out into random
kernel C code after this.

Maybe just keep it PPC32 specific for the moment, will have to do more
tuning for 64 and we have other stuff to do there first.

If you are happy to make it 32-bit only then

Reviewed-by: Nicholas Piggin 

> 
> Before the patch:
> 
> c000b4d4 :
> c000b4d4: 7c 08 02 a6 mflrr0
> c000b4d8: 94 21 ff e0 stwur1,-32(r1)
> c000b4dc: 93 e1 00 1c stw r31,28(r1)
> c000b4e0: 90 01 00 24 stw r0,36(r1)
> c000b4e4: 90 6a 00 88 stw r3,136(r10)
> c000b4e8: 81 6a 00 84 lwz r11,132(r10)
> c000b4ec: 69 6b 00 02 xorir11,r11,2
> c000b4f0: 55 6b ff fe rlwinm  r11,r11,31,31,31
> c000b4f4: 0f 0b 00 00 twnei   r11,0
> c000b4f8: 81 6a 00 a0 lwz r11,160(r10)
> c000b4fc: 55 6b 07 fe clrlwi  r11,r11,31
> c000b500: 0f 0b 00 00 twnei   r11,0
> c000b504: 7c 0c 42 e6 mftbr0
> c000b508: 83 e2 00 08 lwz r31,8(r2)
> c000b50c: 81 82 00 28 lwz r12,40(r2)
> c000b510: 90 02 00 24 stw r0,36(r2)
> c000b514: 7d 8c f8 50 subfr12,r12,r31
> c000b518: 7c 0c 02 14 add r0,r12,r0
> c000b51c: 90 02 00 08 stw r0,8(r2)
> c000b520: 7c 10 13 a6 mtspr   80,r0
> c000b524: 81 62 00 70 lwz r11,112(r2)
> c000b528: 71 60 86 91 andi.   r0,r11,34449
> c000b52c: 40 82 00 34 bne c000b560 
> c000b530: 2b 89 01 b6 cmplwi  cr7,r9,438
> c000b534: 41 9d 00 64 bgt cr7,c000b598 
> 
> c000b538: 3d 40 c0 5c lis r10,-16292
> c000b53c: 55 29 10 3a rlwinm  r9,r9,2,0,29
> c000b540: 39 4a 41 e8 addir10,r10,16872
> c000b544: 80 01 00 24 lwz r0,36(r1)
> c000b548: 7d 2a 48 2e lwzxr9,r10,r9
> c000b54c: 7c 08 03 a6 mtlrr0
> c000b550: 7d 29 03 a6 mtctr   r9
> c000b554: 83 e1 00 1c lwz r31,28(r1)
> c000b558: 38 21 00 20 addir1,r1,32
> c000b55c: 4e 80 04 20 bctr
> 
> After the patch:
> 
> c000b4d4 :
> c000b4d4: 81 6a 00 84 lwz r11,132(r10)
> c000b4d8: 90 6a 00 88 stw r3,136(r10)
> c000b4dc: 69 6b 00 02 xorir11,r11,2
> c000b4e0: 55 6b ff fe rlwinm  r11,r11,31,31,31
> c000b4e4: 0f 0b 00 00 twnei   r11,0
> c000b4e8: 80 6a 00 a0 lwz r3,160(r10)
> c000b4ec: 54 63 07 fe clrlwi  r3,r3,31
> c000b4f0: 0f 03 00 00 twnei   r3,0
> c000b4f4: 7d 6c 42 e6 mftbr11
> c000b4f8: 81 82 00 08 lwz r12,8(r2)
> c000b4fc: 80 02 00 28 lwz r0,40(r2)
> c000b500: 91 62 00 24 stw r11,36(r2)
> c000b504: 7c 00 60 50 subfr0,r0,r12
> c000b508: 7d 60 5a 14 add r11,r0,r11
> c000b50c: 91 62 00 08 stw r11,8(r2)
> c000b510: 7c 10 13 a6 mtspr   80,r0
> c000b514: 80 62 00 70 lwz r3,112(r2)
> c000b518: 70 6b 86 91 andi.   r11,r3,34449
> c000b51c: 40 82 00 28 bne c000b544 
> c000b520: 2b 89 01 b6 cmplwi  cr7,r9,438
> c000b524: 41 9d 00 84 bgt cr7,c000b5a8 
> 
> c000b528: 80 6a 00 88 lwz r3,136(r10)
> c000b52c: 3d 40 c0 5c lis r10,-16292
> c000b530: 55 29 10 3a rlwinm  r9,r9,2,0,29
> c000b534: 39 4a 41 e4 addir10,r10,16868
> c000b538: 7d 2a 48 2e lwzxr9,r10,r9
> c000b53c: 7d 29 03 a6 mtctr   r9
> c000b540: 4e 80 04 20 bctr
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 107ec39f05cb..205902052112 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long 
> r5,
>   return regs->gpr[3];
>   }
>   return -ENOSYS;
> + } else {
> + /* Restore r3 from orig_gpr3 to free up a volatile reg */
> + r3 = regs->orig_gpr3;
>   }
>  
>   /* May be faster to do array_index_nospec? */
> -- 
> 2.25.0
> 
> 


[PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()

2021-02-08 Thread Christophe Leroy
When r3 is not modified, reload it from regs->orig_r3 to free
volatile registers. This avoids a stack frame for the likely part
of system_call_exception()

Before the patch:

c000b4d4 :
c000b4d4:   7c 08 02 a6 mflrr0
c000b4d8:   94 21 ff e0 stwur1,-32(r1)
c000b4dc:   93 e1 00 1c stw r31,28(r1)
c000b4e0:   90 01 00 24 stw r0,36(r1)
c000b4e4:   90 6a 00 88 stw r3,136(r10)
c000b4e8:   81 6a 00 84 lwz r11,132(r10)
c000b4ec:   69 6b 00 02 xorir11,r11,2
c000b4f0:   55 6b ff fe rlwinm  r11,r11,31,31,31
c000b4f4:   0f 0b 00 00 twnei   r11,0
c000b4f8:   81 6a 00 a0 lwz r11,160(r10)
c000b4fc:   55 6b 07 fe clrlwi  r11,r11,31
c000b500:   0f 0b 00 00 twnei   r11,0
c000b504:   7c 0c 42 e6 mftbr0
c000b508:   83 e2 00 08 lwz r31,8(r2)
c000b50c:   81 82 00 28 lwz r12,40(r2)
c000b510:   90 02 00 24 stw r0,36(r2)
c000b514:   7d 8c f8 50 subfr12,r12,r31
c000b518:   7c 0c 02 14 add r0,r12,r0
c000b51c:   90 02 00 08 stw r0,8(r2)
c000b520:   7c 10 13 a6 mtspr   80,r0
c000b524:   81 62 00 70 lwz r11,112(r2)
c000b528:   71 60 86 91 andi.   r0,r11,34449
c000b52c:   40 82 00 34 bne c000b560 
c000b530:   2b 89 01 b6 cmplwi  cr7,r9,438
c000b534:   41 9d 00 64 bgt cr7,c000b598 

c000b538:   3d 40 c0 5c lis r10,-16292
c000b53c:   55 29 10 3a rlwinm  r9,r9,2,0,29
c000b540:   39 4a 41 e8 addir10,r10,16872
c000b544:   80 01 00 24 lwz r0,36(r1)
c000b548:   7d 2a 48 2e lwzxr9,r10,r9
c000b54c:   7c 08 03 a6 mtlrr0
c000b550:   7d 29 03 a6 mtctr   r9
c000b554:   83 e1 00 1c lwz r31,28(r1)
c000b558:   38 21 00 20 addir1,r1,32
c000b55c:   4e 80 04 20 bctr

After the patch:

c000b4d4 :
c000b4d4:   81 6a 00 84 lwz r11,132(r10)
c000b4d8:   90 6a 00 88 stw r3,136(r10)
c000b4dc:   69 6b 00 02 xorir11,r11,2
c000b4e0:   55 6b ff fe rlwinm  r11,r11,31,31,31
c000b4e4:   0f 0b 00 00 twnei   r11,0
c000b4e8:   80 6a 00 a0 lwz r3,160(r10)
c000b4ec:   54 63 07 fe clrlwi  r3,r3,31
c000b4f0:   0f 03 00 00 twnei   r3,0
c000b4f4:   7d 6c 42 e6 mftbr11
c000b4f8:   81 82 00 08 lwz r12,8(r2)
c000b4fc:   80 02 00 28 lwz r0,40(r2)
c000b500:   91 62 00 24 stw r11,36(r2)
c000b504:   7c 00 60 50 subfr0,r0,r12
c000b508:   7d 60 5a 14 add r11,r0,r11
c000b50c:   91 62 00 08 stw r11,8(r2)
c000b510:   7c 10 13 a6 mtspr   80,r0
c000b514:   80 62 00 70 lwz r3,112(r2)
c000b518:   70 6b 86 91 andi.   r11,r3,34449
c000b51c:   40 82 00 28 bne c000b544 
c000b520:   2b 89 01 b6 cmplwi  cr7,r9,438
c000b524:   41 9d 00 84 bgt cr7,c000b5a8 

c000b528:   80 6a 00 88 lwz r3,136(r10)
c000b52c:   3d 40 c0 5c lis r10,-16292
c000b530:   55 29 10 3a rlwinm  r9,r9,2,0,29
c000b534:   39 4a 41 e4 addir10,r10,16868
c000b538:   7d 2a 48 2e lwzxr9,r10,r9
c000b53c:   7d 29 03 a6 mtctr   r9
c000b540:   4e 80 04 20 bctr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/interrupt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 107ec39f05cb..205902052112 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
return regs->gpr[3];
}
return -ENOSYS;
+   } else {
+   /* Restore r3 from orig_gpr3 to free up a volatile reg */
+   r3 = regs->orig_gpr3;
}
 
/* May be faster to do array_index_nospec? */
-- 
2.25.0