在 2024/3/26 下午5:48, Xi Ruoyao 写道:
The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

     int
     main (void)
     {
       unsigned long stat = 0xdeadbeef;
       for (int i = 0; i < 100000000; i++)
         stat = (stat * stat + stat * 114514 + 1919810) % 1000000007;
       asm(""::"r"(stat));
     }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
        lu12i.w $r12,999997440>>12                        # 0x3b9ac000
        ori     $r12,$r12,2567
        mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.



The sequence of instructions after adding the patch is:
        lu12i.w $r15,1152917504>>12                       # 0x44b82000
        ori     $r15,$r15,3993
        mulh.w  $r12,$r16,$r15
        srai.w  $r14,$r16,31
        lu12i.w $r13,999997440>>12                        # 0x3b9ac000
        ori     $r13,$r13,2567
        srai.w  $r12,$r12,28
        sub.w   $r12,$r12,$r14
        mul.w   $r12,$r12,$r13
        sub.w   $r16,$r16,$r12
This sequence of instructions takes 11 clock cycles.

This test case is optimized and takes 6 more clock cycles than before 
optimization,
so I need to run the spec.

Thanks!

gcc/ChangeLog:

        * config/loongarch/loongarch-def.cc
        (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
        default division cost to the average of the best case and worst
        case senarios observed.

gcc/testsuite/ChangeLog:

        * gcc.target/loongarch/div-const-reduction.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch-def.cc                    | 8 ++++----
  gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +++++++++
  2 files changed, 13 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..93e72a520d5 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
    : fp_add (COSTS_N_INSNS (5)),
      fp_mult_sf (COSTS_N_INSNS (5)),
      fp_mult_df (COSTS_N_INSNS (5)),
-    fp_div_sf (COSTS_N_INSNS (8)),
-    fp_div_df (COSTS_N_INSNS (8)),
+    fp_div_sf (COSTS_N_INSNS (12)),
+    fp_div_df (COSTS_N_INSNS (15)),
      int_mult_si (COSTS_N_INSNS (4)),
      int_mult_di (COSTS_N_INSNS (4)),
-    int_div_si (COSTS_N_INSNS (5)),
-    int_div_di (COSTS_N_INSNS (5)),
+    int_div_si (COSTS_N_INSNS (14)),
+    int_div_di (COSTS_N_INSNS (22)),
      movcf2gr (COSTS_N_INSNS (7)),
      movgr2cf (COSTS_N_INSNS (15)),
      branch_cost (6),
diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
new file mode 100644
index 00000000000..0ee86410dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
+
+int
+test (int a)
+{
+  return a % 1000000007;
+}

Reply via email to