On 5/13/24 12:49 PM, Vineet Gupta wrote:
If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

    gcc-13.1 release   |      gcc 230823     |                   |
                       |    g6619b3d4c15c    |   This patch      |  clang/llvm
---------------------------------------------------------------------------------
li      t0,-4096     | li    t0,-4096      | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi sp,sp,-32
li      a4,4096      | add   sp,sp,t0      | add   a5,sp,a0    | add  a1,sp,16
add     sp,sp,t0     | addi  a5,sp,-2032   | sb    zero,0(a5)  | add  a0,a0,a1
li      a5,-4096     | add   a0,a5,a0      | addi  sp,sp,2032  | sb   zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096      | addi  sp,sp,32    | addi sp,sp,2032
add     a4,a4,a5     | sb    zero,2032(a0) | ret               | addi sp,sp,48
addi    a5,sp,16     | addi  t0,t0,-2032   |                   | ret
add     a5,a4,a5     | add   sp,sp,t0      |
add     a0,a5,a0     | ret                 |
li      t0,4096      |
sd      a5,8(sp)     |
sb      zero,2032(a0)|
addi    t0,t0,-2016  |
add     sp,sp,t0     |
ret                  |

gcc/ChangeLog:
        PR target/105733
        * config/riscv/riscv.h: New macros for with aligned offsets.
        * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
        function to split a sum of two s12 values into constituents.
        (riscv_expand_prologue): Handle offset being sum of two S12.
        (riscv_expand_epilogue): Ditto.
        * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
        * gcc.target/riscv/pr105733.c: New Test.
        * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
        expect LUI 4096.
        * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
        * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.



@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
        }
        else
        {
-         if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+         HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+         if (SMALL_OPERAND (adj_off_value))
+           {
+             adjust = GEN_INT (adj_off_value);
+           }
+         else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+           {
+             HOST_WIDE_INT base, off;
+             riscv_split_sum_of_two_s12 (adj_off_value, &base, &off);
+             insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx,
+                                       GEN_INT (base));
+             RTX_FRAME_RELATED_P (insn) = 1;
+             adjust = GEN_INT (off);
+           }
So this was the hunk that we identified internally as causing problems with libgomp's testsuite. We never fully chased it down as this hunk didn't seem terribly important performance wise -- we just set it aside. The thing is it looked basically correct to me. So the failure was certainly unexpected, but it was consistent.

So I think the question is whether or not the CI system runs the libgomp testsuite, particularly in the rv64 linux configuration. If it does, and it passes, then we're good. I'm still finding my way around the configuration, so I don't know if the CI system Edwin & Patrick have built tests libgomp or not.

If it isn't run, then we'll need to do a run to test that. I'm set up here to do that if needed. I can just drop this version into our internal tree, trigger an internal CI run and see if it complains :-)

If it does complain, then we know where to start investigations.




Jeff

Reply via email to