Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
Excerpts from Christophe Leroy's message of February 10, 2021 2:13 am: > > > Le 09/02/2021 à 02:55, Nicholas Piggin a écrit : >> Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am: >>> When r3 is not modified, reload it from regs->orig_r3 to free >>> volatile registers. This avoids a stack frame for the likely part >>> of system_call_exception() >> >> This doesn't on my 64s build, but it does reduce one non volatile >> register save/restore. With quite a bit more register pressure >> reduction 64s can avoid the stack frame as well. > > The stack frame is not due to the registers because on PPC64 you have the > redzone that you don't > have on PPC32. > > As far as I can see, this is due to a call to .arch_local_irq_restore(). > > On ppc32 arch_local_irq_restore() is just a write to MSR. Oh you're right there. We can actually inline fast paths of that I have a patch somewhere, but not sure if it's worthwhile. >> It's a cool trick but quite code and compiler specific so I don't know >> how worthwhile it is to keep considering we're calling out into random >> kernel C code after this. >> >> Maybe just keep it PPC32 specific for the moment, will have to do more >> tuning for 64 and we have other stuff to do there first. >> >> If you are happy to make it 32-bit only then > > I think we can leave without this, that's only one or two cycles won. Okay for this round let's drop it for now. Thanks, Nick
Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
Le 09/02/2021 à 02:55, Nicholas Piggin a écrit : Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am: When r3 is not modified, reload it from regs->orig_r3 to free volatile registers. This avoids a stack frame for the likely part of system_call_exception() This doesn't on my 64s build, but it does reduce one non volatile register save/restore. With quite a bit more register pressure reduction 64s can avoid the stack frame as well. The stack frame is not due to the registers because on PPC64 you have the redzone that you don't have on PPC32. As far as I can see, this is due to a call to .arch_local_irq_restore(). On ppc32 arch_local_irq_restore() is just a write to MSR. It's a cool trick but quite code and compiler specific so I don't know how worthwhile it is to keep considering we're calling out into random kernel C code after this. Maybe just keep it PPC32 specific for the moment, will have to do more tuning for 64 and we have other stuff to do there first. If you are happy to make it 32-bit only then I think we can leave without this, that's only one or two cycles won. Reviewed-by: Nicholas Piggin
Re: [PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am: > When r3 is not modified, reload it from regs->orig_r3 to free > volatile registers. This avoids a stack frame for the likely part > of system_call_exception() This doesn't on my 64s build, but it does reduce one non volatile register save/restore. With quite a bit more register pressure reduction 64s can avoid the stack frame as well. It's a cool trick but quite code and compiler specific so I don't know how worthwhile it is to keep considering we're calling out into random kernel C code after this. Maybe just keep it PPC32 specific for the moment, will have to do more tuning for 64 and we have other stuff to do there first. If you are happy to make it 32-bit only then Reviewed-by: Nicholas Piggin > > Before the patch: > > c000b4d4 : > c000b4d4: 7c 08 02 a6 mflrr0 > c000b4d8: 94 21 ff e0 stwur1,-32(r1) > c000b4dc: 93 e1 00 1c stw r31,28(r1) > c000b4e0: 90 01 00 24 stw r0,36(r1) > c000b4e4: 90 6a 00 88 stw r3,136(r10) > c000b4e8: 81 6a 00 84 lwz r11,132(r10) > c000b4ec: 69 6b 00 02 xorir11,r11,2 > c000b4f0: 55 6b ff fe rlwinm r11,r11,31,31,31 > c000b4f4: 0f 0b 00 00 twnei r11,0 > c000b4f8: 81 6a 00 a0 lwz r11,160(r10) > c000b4fc: 55 6b 07 fe clrlwi r11,r11,31 > c000b500: 0f 0b 00 00 twnei r11,0 > c000b504: 7c 0c 42 e6 mftbr0 > c000b508: 83 e2 00 08 lwz r31,8(r2) > c000b50c: 81 82 00 28 lwz r12,40(r2) > c000b510: 90 02 00 24 stw r0,36(r2) > c000b514: 7d 8c f8 50 subfr12,r12,r31 > c000b518: 7c 0c 02 14 add r0,r12,r0 > c000b51c: 90 02 00 08 stw r0,8(r2) > c000b520: 7c 10 13 a6 mtspr 80,r0 > c000b524: 81 62 00 70 lwz r11,112(r2) > c000b528: 71 60 86 91 andi. r0,r11,34449 > c000b52c: 40 82 00 34 bne c000b560 > c000b530: 2b 89 01 b6 cmplwi cr7,r9,438 > c000b534: 41 9d 00 64 bgt cr7,c000b598 > > c000b538: 3d 40 c0 5c lis r10,-16292 > c000b53c: 55 29 10 3a rlwinm r9,r9,2,0,29 > c000b540: 39 4a 41 e8 addir10,r10,16872 > c000b544: 80 01 00 24 lwz r0,36(r1) > c000b548: 7d 2a 48 2e lwzxr9,r10,r9 > c000b54c: 7c 08 03 a6 mtlrr0 > c000b550: 7d 29 03 a6 mtctr r9 > c000b554: 83 e1 00 1c lwz r31,28(r1) > c000b558: 38 21 00 20 addir1,r1,32 > c000b55c: 4e 80 04 20 bctr > > After the patch: > > c000b4d4 : > c000b4d4: 81 6a 00 84 lwz r11,132(r10) > c000b4d8: 90 6a 00 88 stw r3,136(r10) > c000b4dc: 69 6b 00 02 xorir11,r11,2 > c000b4e0: 55 6b ff fe rlwinm r11,r11,31,31,31 > c000b4e4: 0f 0b 00 00 twnei r11,0 > c000b4e8: 80 6a 00 a0 lwz r3,160(r10) > c000b4ec: 54 63 07 fe clrlwi r3,r3,31 > c000b4f0: 0f 03 00 00 twnei r3,0 > c000b4f4: 7d 6c 42 e6 mftbr11 > c000b4f8: 81 82 00 08 lwz r12,8(r2) > c000b4fc: 80 02 00 28 lwz r0,40(r2) > c000b500: 91 62 00 24 stw r11,36(r2) > c000b504: 7c 00 60 50 subfr0,r0,r12 > c000b508: 7d 60 5a 14 add r11,r0,r11 > c000b50c: 91 62 00 08 stw r11,8(r2) > c000b510: 7c 10 13 a6 mtspr 80,r0 > c000b514: 80 62 00 70 lwz r3,112(r2) > c000b518: 70 6b 86 91 andi. r11,r3,34449 > c000b51c: 40 82 00 28 bne c000b544 > c000b520: 2b 89 01 b6 cmplwi cr7,r9,438 > c000b524: 41 9d 00 84 bgt cr7,c000b5a8 > > c000b528: 80 6a 00 88 lwz r3,136(r10) > c000b52c: 3d 40 c0 5c lis r10,-16292 > c000b530: 55 29 10 3a rlwinm r9,r9,2,0,29 > c000b534: 39 4a 41 e4 addir10,r10,16868 > c000b538: 7d 2a 48 2e lwzxr9,r10,r9 > c000b53c: 7d 29 03 a6 mtctr r9 > c000b540: 4e 80 04 20 bctr > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/interrupt.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c > index 107ec39f05cb..205902052112 100644 > --- a/arch/powerpc/kernel/interrupt.c > +++ b/arch/powerpc/kernel/interrupt.c > @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long > r5, > return regs->gpr[3]; > } > return -ENOSYS; > + } else { > + /* Restore r3 from orig_gpr3 to free up a volatile reg */ > + r3 = regs->orig_gpr3; > } > > /* May be faster to do array_index_nospec? */ > -- > 2.25.0 > >
[PATCH v5 16/22] powerpc/syscall: Avoid stack frame in likely part of system_call_exception()
When r3 is not modified, reload it from regs->orig_r3 to free volatile registers. This avoids a stack frame for the likely part of system_call_exception() Before the patch: c000b4d4 : c000b4d4: 7c 08 02 a6 mflrr0 c000b4d8: 94 21 ff e0 stwur1,-32(r1) c000b4dc: 93 e1 00 1c stw r31,28(r1) c000b4e0: 90 01 00 24 stw r0,36(r1) c000b4e4: 90 6a 00 88 stw r3,136(r10) c000b4e8: 81 6a 00 84 lwz r11,132(r10) c000b4ec: 69 6b 00 02 xorir11,r11,2 c000b4f0: 55 6b ff fe rlwinm r11,r11,31,31,31 c000b4f4: 0f 0b 00 00 twnei r11,0 c000b4f8: 81 6a 00 a0 lwz r11,160(r10) c000b4fc: 55 6b 07 fe clrlwi r11,r11,31 c000b500: 0f 0b 00 00 twnei r11,0 c000b504: 7c 0c 42 e6 mftbr0 c000b508: 83 e2 00 08 lwz r31,8(r2) c000b50c: 81 82 00 28 lwz r12,40(r2) c000b510: 90 02 00 24 stw r0,36(r2) c000b514: 7d 8c f8 50 subfr12,r12,r31 c000b518: 7c 0c 02 14 add r0,r12,r0 c000b51c: 90 02 00 08 stw r0,8(r2) c000b520: 7c 10 13 a6 mtspr 80,r0 c000b524: 81 62 00 70 lwz r11,112(r2) c000b528: 71 60 86 91 andi. r0,r11,34449 c000b52c: 40 82 00 34 bne c000b560 c000b530: 2b 89 01 b6 cmplwi cr7,r9,438 c000b534: 41 9d 00 64 bgt cr7,c000b598 c000b538: 3d 40 c0 5c lis r10,-16292 c000b53c: 55 29 10 3a rlwinm r9,r9,2,0,29 c000b540: 39 4a 41 e8 addir10,r10,16872 c000b544: 80 01 00 24 lwz r0,36(r1) c000b548: 7d 2a 48 2e lwzxr9,r10,r9 c000b54c: 7c 08 03 a6 mtlrr0 c000b550: 7d 29 03 a6 mtctr r9 c000b554: 83 e1 00 1c lwz r31,28(r1) c000b558: 38 21 00 20 addir1,r1,32 c000b55c: 4e 80 04 20 bctr After the patch: c000b4d4 : c000b4d4: 81 6a 00 84 lwz r11,132(r10) c000b4d8: 90 6a 00 88 stw r3,136(r10) c000b4dc: 69 6b 00 02 xorir11,r11,2 c000b4e0: 55 6b ff fe rlwinm r11,r11,31,31,31 c000b4e4: 0f 0b 00 00 twnei r11,0 c000b4e8: 80 6a 00 a0 lwz r3,160(r10) c000b4ec: 54 63 07 fe clrlwi r3,r3,31 c000b4f0: 0f 03 00 00 twnei r3,0 c000b4f4: 7d 6c 42 e6 mftbr11 c000b4f8: 81 82 00 08 lwz r12,8(r2) c000b4fc: 80 02 00 28 lwz r0,40(r2) c000b500: 91 62 00 24 stw r11,36(r2) c000b504: 7c 00 60 50 subfr0,r0,r12 c000b508: 7d 60 5a 14 add r11,r0,r11 c000b50c: 91 62 00 08 stw r11,8(r2) c000b510: 7c 10 13 a6 mtspr 80,r0 c000b514: 80 62 00 70 lwz r3,112(r2) c000b518: 70 6b 86 91 andi. r11,r3,34449 c000b51c: 40 82 00 28 bne c000b544 c000b520: 2b 89 01 b6 cmplwi cr7,r9,438 c000b524: 41 9d 00 84 bgt cr7,c000b5a8 c000b528: 80 6a 00 88 lwz r3,136(r10) c000b52c: 3d 40 c0 5c lis r10,-16292 c000b530: 55 29 10 3a rlwinm r9,r9,2,0,29 c000b534: 39 4a 41 e4 addir10,r10,16868 c000b538: 7d 2a 48 2e lwzxr9,r10,r9 c000b53c: 7d 29 03 a6 mtctr r9 c000b540: 4e 80 04 20 bctr Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/interrupt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c index 107ec39f05cb..205902052112 100644 --- a/arch/powerpc/kernel/interrupt.c +++ b/arch/powerpc/kernel/interrupt.c @@ -117,6 +117,9 @@ notrace long system_call_exception(long r3, long r4, long r5, return regs->gpr[3]; } return -ENOSYS; + } else { + /* Restore r3 from orig_gpr3 to free up a volatile reg */ + r3 = regs->orig_gpr3; } /* May be faster to do array_index_nospec? */ -- 2.25.0