[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2021-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

Andrew Pinski  changed:

   What|Removed |Added

 Status|REOPENED|NEW
   Severity|normal  |enhancement

--- Comment #5 from Andrew Pinski  ---
_addcarry_u64(0, a, c, &result0) can definitely be folded into
__builtin_uadd_overflowll/.ADD_OVERFLOW without any troubles.

The second _addcarry_u64 here since the result is not used can be folded into
just an a few adds.

That leaves if there case where the first argument is non-zero and the result
is used. I think going the route that Jakub mentions is best for this last
case.

[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2021-09-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> That leaves if there case where the first argument is non-zero and the
> result is used. I think going the route that Jakub mentions is best for this
> last case.

Maybe even generic builtins/internal functions for this case even as clang has
__builtin_addc, etc.

[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2020-03-01 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

--- Comment #1 from Andrew Pinski  ---
This is already been fixed in GCC 10:
addq%rdx, %rdi
movq%rsi, %rax
adcq%rcx, %rax
xorq%rdi, %rax
ret
.ident  "GCC: (GNU) 10.0.1 20200121 (experimental) [master revision
e0a5b313c1a:f4dc220f630:b313d3c49c2387b5e212df22a5e6ecc0c4e95c0a]"

[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2020-03-01 Thread lloyd at randombit dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

Jack Lloyd  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Jack Lloyd  ---
Good enough for me, closing, thanks.

[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2020-03-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

Jakub Jelinek  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2020-03-03
 CC||jakub at gcc dot gnu.org
 Resolution|FIXED   |---
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek  ---
I must say I can't really reproduce this, tried with -O3 r227270 (there is
expected worse code), r227271 (already good), r23, r24, r25,
r26, r270002, r28, r10-6985 and all emit the same 4 instructions (addq,
adcq, movq and xorq) + ret, just the scheduling slightly varries over time.
Though, looking at your assembly, seems you are using -fstack-protector-all or
-fstack-protector-strong, with that option it indeed emits more code, but it
did already in r227271.
And the reason why we get worse code with one of -fstack-protector-{all,strong}
is that the stores by the builtin are done using alias set 0 and so RTL DSE
isn't able to remove the stores because it thinks they might alias with the
stack canary.
One way to fix this would be I'd say to use alias set of unsigned int for
unsigned long long (depending on what the builtin prototype is), and another
one, perhaps better long term, would be fold those builtins during gimple to
some another md builtin, perhaps with space in name to make it user
unaccessible, which would return a complex unsigned or complex unsigned long
long containing both results, so that we wouldn't have the addressable variable
during expansion at all and stack protector wouldn't need to mark anything
(unless there are other user vars that need it).

[Bug target/93990] [x86] Silly code generation for _addcarry_u32/_addcarry_u64

2020-03-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93990

--- Comment #4 from Jakub Jelinek  ---
I have tried:
--- gcc/config/i386/i386-expand.c.jj2020-01-12 11:54:36.323414766 +0100
+++ gcc/config/i386/i386-expand.c   2020-03-03 12:44:54.116134173 +0100
@@ -11902,7 +11902,18 @@ rdseed_step:
   emit_insn (gen_rtx_SET (target, pat));

   /* Store the result.  */
-  emit_move_insn (gen_rtx_MEM (mode0, op4), op0);
+  {
+op4 = gen_rtx_MEM (mode0, op4);
+tree type = (mode0 == SImode ? unsigned_type_node
+: long_long_unsigned_type_node);
+   if (TREE_CODE (arg3) == ADDR_EXPR
+   && useless_type_conversion_p (TREE_TYPE (TREE_OPERAND (arg3, 0)),
+ type))
+ set_mem_attributes (op4, TREE_OPERAND (arg3, 0), 0);
+   else
+ set_mem_attributes (op4, type, 0);
+   emit_move_insn (op4, op0);
+  }

   return target;

but even that doesn't help, the stores now have [1 result0+0 S8 A64] instead of
former [0  S8 A8], but RTL DSE still doesn't remove those for whatever reason.