On 7/21/23 11:31, Palmer Dabbelt wrote:
On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd     fa5,0(a0)
| ret

With patch

| sd      zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Given that it can ICE, we should probably backport it to 13.

FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree already. And this will merge ahead of that.
I'm fine with backport though.


Ran thru full multilib testsuite, there was 1 false failure due to

Did you run the test with autovec?

I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite being enabled at -O2 and above will not kick in.
I think we should add a 'v' multilib.


There's also a pmode_reg_or_0_operand, some of those don't appear protected from FP values.  So we might need something like

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"

(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "<VNX1_QHSD:gs_extension>")
   (match_operand 4 "<VNX1_QHSD:gs_scale>")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...

What does 'P' do here ?

+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */

IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
     double sum = 0;
     for (int i = 0; i < 8; ++i)
       sum += d[i];
     return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe I'm
missing something?`

I need to unpack this first :-)


diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */

I think that autovec one is the only possible dependency that might have snuck
in, so we should be safe otherwise.  Thanks!

I'm not sure if this specific comment is related to the xthead test or continuation of above. For xthead it is real issue since I saw a random "lw" in lto assembler output.

Reply via email to