On 2/11/22 01:53, Richard Sandiford wrote:
Dan Li <ashim...@linux.alibaba.com> writes:
On 2/10/22 01:55, Richard Sandiford wrote:

But treating scs push and scs pop as part of the register save and
restore sequences would have one advantage: it would allow the
scs push and scs pop to be shrink-wrapped.


Sorry for my limited knowledge of shrink warping, I don't think I get
it here (I tried to find a case when compiling the kernel and some
gcc test cases but I still don't have a clue.).

I see that the bitmap of LR_REGNUM is cleared in
aarch64_get_separate_components and scs push/pop are x18 based operations.

If we handle them in aarch64_restore/save_callee_saves,
could scs push/pop be shrink-wrapped in some cases?

Yeah, I think so.  E.g. for:

void f();
int g(int x) {
     if (x == 0)
         return 1;
     f();
     return 2;
}

shrink wrapping would allow the scs push and pop to move along with the
x30 save:

g:
         cbnz    w0, .L9
         mov     w0, 1
         ret
.L9:
         stp     x29, x30, [sp, -16]!
         mov     x29, sp
         bl      f
         mov     w0, 2
         ldp     x29, x30, [sp], 16
         ret


Thanks Richard, (to make sure I understand correctly :)) I think
it means that the current patch could do a "shrink-wapping", but
the X30 could not be treat as a "component", now it could gen code
like:

g:
        cbnz    w0, .L9
        mov     w0, 1
        ret
.L9:
        str     x30, [x18], 8
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      f
        ldr     x30, [x18, -8]!
        mov     w0, 2
        ldr     x29, [sp], 16
        ret

The idea is that aarch64_save_callee_saves would treat the scs push
as part of saving x30 (along with the normal store to the frame chain,
when used).  aarch64_restore_callee_saves would similarly treat the scs
pop as the way of restoring x30 (instead of loading from the frame chain).
This is in contrast to the current patch, where the scs push and pop are
treated as fixed parts of the prologue and epilogue instead, and where
aarch64_restore_callee_saves tries to avoid doing anything for x30.

If shrink-wrapping decides to treat x30 as a separate “component”, as it
does in the example above, then the scs push and pop would be emitted
by aarch64_process_components instead.

It would be more complex, but it would give better code.


Following your idea, I made a poc to add x30 in component bitmap:

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 35f6f64f5b2..fc9b5e7af54 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void)
   if (reg1 != INVALID_REGNUM)
     bitmap_clear_bit (components, reg1);
- bitmap_clear_bit (components, LR_REGNUM);
   bitmap_clear_bit (components, SP_REGNUM);
return components;
@@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb)
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
     if (!fixed_regs[regno]
-       && !crtl->abi->clobbers_full_reg_p (regno)
+       && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM)
        && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
            || bitmap_bit_p (in, regno)
            || bitmap_bit_p (gen, regno)

And with a test code compiled with -fno-omit-frame-pointer:

void f();
int g(int x) {
    if (x == 0) {
        __asm__ ("":::"x19", "x20");
        return 1;
    }
    f();
    return 2;
}

Then it seems X30 is treat as a "component" (the test
result of aarch64.exp also seems fine).

g:
        stp     x19, x20, [sp, -32]!
        cbnz    w0, .L2
        mov     w0, 1
        ldp     x19, x20, [sp], 32
        ret
.L2:
        str     x30, [sp, 16]
        bl      f
        ldr     x30, [sp, 16]
        mov     w0, 2
        ldp     x19, x20, [sp], 32
        ret

And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).

Thanks,
Dan

Reply via email to