[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f

2021-09-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:a7b626d98a9a821ffb33466818d6aa86cac1d6fd

commit r12-3413-ga7b626d98a9a821ffb33466818d6aa86cac1d6fd
Author: Jakub Jelinek 
Date:   Wed Sep 8 11:25:31 2021 +0200

i386: Fix up @xorsign3_1 [PR102224]

As the testcase shows, we miscompile @xorsign3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear.  For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that
case.
This case can be easily tested with the xorsign3 expander change
reverted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(unspec:SF [
(reg:SF 20 xmm0 [88])
(reg:SF 21 xmm1 [89])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 
S16 A128])
] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
 (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
(plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
 (nil))
but split the xorsign into:
vandps  .LC0(%rip), %xmm1, %xmm1
vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the  <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
 <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

2021-09-08  Jakub Jelinek  

PR target/102224
* config/i386/i386.md (xorsign3): If operands[1] is equal to
operands[2], emit abs2 instead.
(@xorsign3_1): Add early-clobbers for output operand, enable
first alternative even for avx, add another alternative with
= <- 0, Yv, Yvm constraints.
* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
to op1, emit vpandn instead.

* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f

2021-09-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #12 from Andrew Pinski  ---
*** Bug 102227 has been marked as a duplicate of this bug. ***

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f

2021-09-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #11 from Jakub Jelinek  ---
Created attachment 51422
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51422=edit
gcc12-pr102224.patch

Untested fix.  It is actually a mess.
While we can during expansion emit efficient fabs code if the two input
operands are equal, we still need to ensure correctness for the case where the
operand equality is discovered only after expansion.
This patch does that by ensuring through early-clobber and constraints that
for pre-AVX dest==op1 is different from op0, because we want to overwrite op1
first with op1 & mask before using op0.  For AVX, the constraints ensure that
either all of dest, op0 and op1 are different, or any two of them are the same
but the third one is different.  If op0 and op1 are different, then it is ok
without further changes, the i386-expand.c changes are for the op0 == op1 case
where we've ensured that dest is different from the inputs - we can emit vpandn
but if the mask is in memory, we need to force it into a register and can use
dest for that.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f

2021-09-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
The problem is actually about not catering to op0 == op1.
In that case what we really want is x & 0x8000 rather than x ^ (x &
0x8000).
Unfortunately after RA the mask needs to be already emitted, I think e.g.
because of PIC etc. we can't easily use a different constant at that point, and
for non-AVX pandn insn the negation is on the "0" constrained operand.
So
--- gcc/config/i386/i386-expand.c.jj2021-09-06 14:47:43.184050185 +0200
+++ gcc/config/i386/i386-expand.c   2021-09-07 11:35:30.798849245 +0200
@@ -2289,12 +2289,20 @@ ix86_split_xorsign (rtx operands[])
   mode = GET_MODE (dest);
   vmode = GET_MODE (mask);

-  op1 = lowpart_subreg (vmode, op1, mode);
-  x = gen_rtx_AND (vmode, op1, mask);
-  emit_insn (gen_rtx_SET (op1, x));
+  if (rtx_equal_p (op0, op1))
+{
+  op0 = lowpart_subreg (vmode, op0, mode);
+  x = gen_rtx_AND (vmode, op0, gen_rtx_NOT (vmode, mask));
+}
+  else
+{
+  op1 = lowpart_subreg (vmode, op1, mode);
+  x = gen_rtx_AND (vmode, op1, mask);
+  emit_insn (gen_rtx_SET (op1, x));

-  op0 = lowpart_subreg (vmode, op0, mode);
-  x = gen_rtx_XOR (vmode, op1, op0);
+  op0 = lowpart_subreg (vmode, op0, mode);
+  x = gen_rtx_XOR (vmode, op1, op0);
+}

   dest = lowpart_subreg (vmode, dest, mode);
   emit_insn (gen_rtx_SET (dest, x));
doesn't work without -mavx.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f

2021-09-07 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Martin Liška  changed:

   What|Removed |Added

Summary|[9/10/11/12 regession]  |[9/10/11/12 regession]
   |wrong code for `x * |wrong code for `x *
   |copysign(1.0, x)`   |copysign(1.0, x)` since
   ||r9-5298-g33142cf9cf82aa1f
 CC||hjl at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #9 from Martin Liška  ---
(In reply to Gabriel Ravier from comment #5)
> Actually it seems to me like this is a GCC 9 regression, ever since this
> pattern exists: GCC 9, 10 and 11 emit the exact same faulty code.

I can confirm that:

$ cat pr102224.c
float
__attribute__((noipa))
f(float x, float y)
{
return x * __builtin_copysignf(1.0f, x);
}

int main()
{
  float a = 1.23f;
  volatile float b = f(a, a);

  __builtin_printf ("a: %.2f, b: %.2f\n", a, b);
  if (b != a)
__builtin_abort ();

  return 0;
}

$ gcc pr102224.c -g -O3 && ./a.out
a: 1.23, b: 0.00
Aborted (core dumped)

Started with r9-5298-g33142cf9cf82aa1f.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.0|9.5
   Priority|P3  |P2
 Target|X86 |x86_64-*-* i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-09-07
 Ever confirmed|0   |1

--- Comment #8 from Richard Biener  ---
Miscompiled with -O2, expanding from

  _3 = .XORSIGN (x_2(D), x_2(D)); [tail call]

and the issue is in ix86_split_xorsign which does

  dest = lowpart_subreg (vmode, dest, mode);
  x = gen_rtx_AND (vmode, dest, mask);
  emit_insn (gen_rtx_SET (dest, x));

  op0 = lowpart_subreg (vmode, op0, mode);
  x = gen_rtx_XOR (vmode, dest, op0);
  emit_insn (gen_rtx_SET (dest, x));

not catering for op0 == dest

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #7 from Gabriel Ravier  ---
Also, `-ffast-math` seems to "fix" this, since in that case the code is
recognized as an ABS_EXPR pattern and as such results in the same code being
emitted without the xor. Is there any reason this isn't the case without fast
math ? I'd assume the xor wouldn't do anything w.r.t. conformance, personally.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Gabriel Ravier  changed:

   What|Removed |Added

Summary|[12 regession] wrong code   |[9/10/11/12 regession]
   |for `x * copysign(1.0, x)`  |wrong code for `x *
   ||copysign(1.0, x)`

--- Comment #6 from Gabriel Ravier  ---
Actually, I've only gotten a snapshot from the 5th, which does not appear to
include HJ's patch from the 4th (which seems rather odd). Does it happen to fix
this ? I'd assume it does not, since that patch just seems to care about not
destructing the source and not about the emission of the xor that breaks this,
but I can't know for sure rn.