[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2022-06-23 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

Roger Sayle  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 CC||roger at nextmovesoftware dot 
com

--- Comment #6 from Roger Sayle  ---
This is now fixed on mainline (and GCC 12).

[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2022-03-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

--- Comment #5 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:bec69ac548b0f37b41d07082d6ee52b52d356536

commit r12-7743-gbec69ac548b0f37b41d07082d6ee52b52d356536
Author: H.J. Lu 
Date:   Mon Mar 21 13:57:31 2022 -0700

x86: Disable AVX on pr86722.c and pr90356.c

SSE/SSE2 are enabled explicitly on pr86722.c and pr90356.c.  Disable AVX
to avoid AVX with -march=native.

PR target/86722
PR tree-optimization/90356
* gcc.target/i386/pr86722.c: Add -mno-avx.
* gcc.target/i386/pr90356.c: Likewise.

[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2022-03-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:c482c28ba4c549006deb70dead90fe8ab34dcbcf

commit r12-7693-gc482c28ba4c549006deb70dead90fe8ab34dcbcf
Author: Roger Sayle 
Date:   Thu Mar 17 21:56:32 2022 +

PR 90356: Use xor to load const_double 0.0 on SSE (always)

Implementations of the x87 floating point instruction set have always
had some pretty strange characteristics.  For example on the original
Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
when a regular FLD (load from memory) took just 1 cycle!?  Given that
back then memory latencies were much lower (relatively) than they are
today, these instructions were all but useless except when optimizing
for size (impressively FLDZ/FLDPI require only two bytes).

Such was the world back in 2006 when Uros Bizjak first added support for
fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
and then shortly after sensibly disabled them for !optimize_size with
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html

Alas this vestigial logic still persists in the compiler today,
so for example on x86_64 for the following function:

double foo(double x) { return x + 0.0; }

generates with -O2

foo:addsd   .LC0(%rip), %xmm0
ret
.LC0:   .long   0
.long   0

preferring to read the constant 0.0 from memory [the constant pool],
except when optimizing for size.  With -Os we get:

foo:xorps   %xmm1, %xmm1
addsd   %xmm1, %xmm0
ret

Which is not only smaller (the two instructions require seven bytes vs.
eight for the original addsd from mem, even without considering the
constant pool) but is also faster on modern hardware.  The latter code
sequence is generated by both clang and msvc with -O2.  Indeed Agner
Fogg documents the set of floating point/SSE constants that it's
cheaper to materialize than to load from memory.

This patch shuffles the conditions on the i386 backend's *movtf_internal,
*movdf_internal and *movsf_internal define_insns to untangle the newer
TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
conditions.  Amongst the benefits of this are that it improves the code
generated for PR tree-optimization/90356 and resolves PR target/86722.

2022-03-17  Roger Sayle  

gcc/ChangeLog
PR target/86722
PR tree-optimization/90356
* config/i386/i386.md (*movtf_internal): Don't guard
standard_sse_constant_p clause by optimize_function_for_size_p.
(*movdf_internal): Likewise.
(*movsf_internal): Likewise.

gcc/testsuite/ChangeLog
PR target/86722
PR tree-optimization/90356
* gcc.target/i386/pr86722.c: New test case.
* gcc.target/i386/pr90356.c: New test case.

[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2021-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org
   Last reconfirmed||2021-08-03
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
One way of improving this is to get the conditional move late in gimple.

[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2018-08-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

--- Comment #2 from Marc Glisse  ---
noce_try_cmove has

  if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
  && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))

but the first 3 times we go through there, we give up because if_info->a is
(mem/u/c:DF (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S8 A64])

Looking in the dump, I see:

(insn 6 15 16 5 (set (reg:DF 88 [ iftmp.0_3 ])
(mem/u/c:DF (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S8 A64]))
"e.c":3 130 {*movdf_internal}
 (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))

so if we had looked through the REG_EQUAL to see the constant, things might
have worked better. I don't know if we can do that though, since we seem very
adverse to anything that looks like constant propagation in RTL.

In ce2, we arrive there with registers instead of constants, so we do
transform, but without seeing the constant 0 we cannot simplify.

[Bug target/86722] ifcvt produces x&0 that is never cleaned up

2018-07-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86722

--- Comment #1 from Richard Biener  ---
code-gen should go thorough simplify_gen_* which should perform constant
folding.