On 3/16/24 11:35 AM, Vineet Gupta wrote:
If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

    gcc-13.1 release   |      gcc 230823     |                   |
                       |    g6619b3d4c15c    |   This patch      |  clang/llvm
---------------------------------------------------------------------------------
li      t0,-4096     | li    t0,-4096      | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi sp,sp,-32
li      a4,4096      | add   sp,sp,t0      | add   a5,sp,a0    | add  a1,sp,16
add     sp,sp,t0     | addi  a5,sp,-2032   | sb    zero,0(a5)  | add  a0,a0,a1
li      a5,-4096     | add   a0,a5,a0      | addi  sp,sp,2032  | sb   zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096      | addi  sp,sp,32    | addi sp,sp,2032
add     a4,a4,a5     | sb    zero,2032(a0) | ret               | addi sp,sp,48
addi    a5,sp,16     | addi  t0,t0,-2032   |                   | ret
add     a5,a4,a5     | add   sp,sp,t0      |
add     a0,a5,a0     | ret                 |
li      t0,4096      |
sd      a5,8(sp)     |
sb      zero,2032(a0)|
addi    t0,t0,-2016  |
add     sp,sp,t0     |
ret                  |

gcc/ChangeLog:
        PR target/105733
        * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
        function to split a sum of two s12 values into constituents.
        (riscv_expand_prologue): Handle offset being sum of two S12.
        (riscv_expand_epilogue): Ditto.
        * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
        * gcc.target/riscv/pr105733.c: New Test.
        * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
        expect LUI 4096.
        * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.
Yea, wouldn't expect this to move the needle on spec since it's just hitting the prologue/epilogue. In fact, I wouldn't be surprised if there were other stack frame sizes that could be improved. But I wouldn't bother chasing down those other cases.

If we think about the embedded space, they're probably not going to want to see functions with large frames to begin with. So optimizing those cases for the embedded space just doesn't make much sense.

In the distro space, by this time next year we'll be living in a world where stack clash mitigations are enabled. So for any given size stack frame, it'll be allocated in at most 1 page chunks. So again, going to any significant length to optimize other cases just doesn't make much sense.

So we probably should go with this patch in the gcc-15 space, but I wouldn't suggest heroic efforts for other sized stack frames.

jeff

Reply via email to