Dan Li <ashim...@linux.alibaba.com> writes: > On 2/11/22 01:53, Richard Sandiford wrote: >> Dan Li <ashim...@linux.alibaba.com> writes: >>> On 2/10/22 01:55, Richard Sandiford wrote: >>>>> >>>> But treating scs push and scs pop as part of the register save and >>>> restore sequences would have one advantage: it would allow the >>>> scs push and scs pop to be shrink-wrapped. >>>> >>> >>> Sorry for my limited knowledge of shrink warping, I don't think I get >>> it here (I tried to find a case when compiling the kernel and some >>> gcc test cases but I still don't have a clue.). >>> >>> I see that the bitmap of LR_REGNUM is cleared in >>> aarch64_get_separate_components and scs push/pop are x18 based operations. >>> >>> If we handle them in aarch64_restore/save_callee_saves, >>> could scs push/pop be shrink-wrapped in some cases? >> >> Yeah, I think so. E.g. for: >> >> void f(); >> int g(int x) { >> if (x == 0) >> return 1; >> f(); >> return 2; >> } >> >> shrink wrapping would allow the scs push and pop to move along with the >> x30 save: >> >> g: >> cbnz w0, .L9 >> mov w0, 1 >> ret >> .L9: >> stp x29, x30, [sp, -16]! >> mov x29, sp >> bl f >> mov w0, 2 >> ldp x29, x30, [sp], 16 >> ret >> > > Thanks Richard, (to make sure I understand correctly :)) I think > it means that the current patch could do a "shrink-wapping", but > the X30 could not be treat as a "component", now it could gen code > like: > > g: > cbnz w0, .L9 > mov w0, 1 > ret > .L9: > str x30, [x18], 8 > stp x29, x30, [sp, -16]! > mov x29, sp > bl f > ldr x30, [x18, -8]! > mov w0, 2 > ldr x29, [sp], 16 > ret
Ah, right, sorry. I'd forgotten that this happened independently of the components stuff (and has to, since like you say, we don't treat LR_REGNUM as a separable component). >> The idea is that aarch64_save_callee_saves would treat the scs push >> as part of saving x30 (along with the normal store to the frame chain, >> when used). aarch64_restore_callee_saves would similarly treat the scs >> pop as the way of restoring x30 (instead of loading from the frame chain). >> This is in contrast to the current patch, where the scs push and pop are >> treated as fixed parts of the prologue and epilogue instead, and where >> aarch64_restore_callee_saves tries to avoid doing anything for x30. >> >> If shrink-wrapping decides to treat x30 as a separate “component”, as it >> does in the example above, then the scs push and pop would be emitted >> by aarch64_process_components instead. >> >> It would be more complex, but it would give better code. >> > > Following your idea, I made a poc to add x30 in component bitmap: > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 35f6f64f5b2..fc9b5e7af54 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void) > if (reg1 != INVALID_REGNUM) > bitmap_clear_bit (components, reg1); > > - bitmap_clear_bit (components, LR_REGNUM); > bitmap_clear_bit (components, SP_REGNUM); > > return components; > @@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb) > /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ > for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) > if (!fixed_regs[regno] > - && !crtl->abi->clobbers_full_reg_p (regno) > + && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM) > && (TEST_HARD_REG_BIT (extra_caller_saves, regno) > || bitmap_bit_p (in, regno) > || bitmap_bit_p (gen, regno) > > And with a test code compiled with -fno-omit-frame-pointer: > > void f(); > int g(int x) { > if (x == 0) { > __asm__ ("":::"x19", "x20"); > return 1; > } > f(); > return 2; > } > > Then it seems X30 is treat as a "component" (the test > result of aarch64.exp also seems fine). > > g: > stp x19, x20, [sp, -32]! > cbnz w0, .L2 > mov w0, 1 > ldp x19, x20, [sp], 32 > ret > .L2: > str x30, [sp, 16] > bl f > ldr x30, [sp, 16] > mov w0, 2 > ldp x19, x20, [sp], 32 > ret > > And I think maybe we could handle this through three patches: > 1.Keep current patch (a V5) unchanged for scs. > 2.Add shrink-warpping for X30: > logically this might be a separate topic, and I think more testing > might be needed here (Well, I'm a little worried about if there might > be other effects, since I just read this part of the code roughly > yesterday). > 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for > the PAC code in pro/epilogue, since it's also the operation of the X30). Yeah, that's fair. (Like I said earlier, I wasn't asking for the shrink-wrapping change. It was just a note in passing. But as you point out, the individual shrink-wrapping support would be even more work than I'd imagined.) Thanks, Richard