在 2024/3/27 下午8:42, Xi Ruoyao 写道:
On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
在 2024/3/26 下午5:48, Xi Ruoyao 写道:
The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

      int
      main (void)
      {
        unsigned long stat = 0xdeadbeef;
        for (int i = 0; i < 100000000; i++)
          stat = (stat * stat + stat * 114514 + 1919810) % 1000000007;
        asm(""::"r"(stat));
      }

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
The test case div-const-reduction.c is modified to assemble the instruction
sequence as follows:
        lu12i.w $r12,999997440>>12                        # 0x3b9ac000
        ori     $r12,$r12,2567
        mod.w   $r13,$r13,$r12

This sequence of instructions takes 5 clock cycles.
It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

Hmm indeed, it seems a waste to do this reduction for int / 1000000007.
I'll try to make a better heuristic as Richard suggests...
Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

I tested spec2006. In the floating-point program, the test items with large

fluctuations are removed, and the rest is basically unchanged.

The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).

Reply via email to