[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2018-11-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Jakub Jelinek  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Jakub Jelinek  ---
Fixed.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2018-11-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #14 from Martin Liška  ---
Jakub: Can the bug be marked as resolved?

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Yuri Rumyantsev  changed:

   What|Removed |Added

 CC||ysrumyan at gmail dot com

--- Comment #13 from Yuri Rumyantsev  ---
The fix r235764 introduced regression described in PR71956.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-05-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #12 from Jakub Jelinek  ---
Author: jakub
Date: Tue May  3 11:45:04 2016
New Revision: 235816

URL: https://gcc.gnu.org/viewcvs?rev=235816=gcc=rev
Log:
PR rtl-optimization/70467
* config/i386/predicates.md (x86_64_hilo_int_operand,
x86_64_hilo_general_operand): New predicates.
* config/i386/constraints.md (Wd): New constraint.
* config/i386/i386.md (mode attr di): Use Wd instead of e.
(general_hilo_operand): New mode attr.
(add3, sub3): Use 
instead of .
(*add3_doubleword, *sub3_doubleword): Use
x86_64_hilo_general_operand instead of .

* gcc.target/i386/pr70467-3.c: New test.
* gcc.target/i386/pr70467-4.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70467-3.c
trunk/gcc/testsuite/gcc.target/i386/pr70467-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/constraints.md
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/predicates.md
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-05-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #11 from Jakub Jelinek  ---
Author: jakub
Date: Mon May  2 16:46:10 2016
New Revision: 235765

URL: https://gcc.gnu.org/viewcvs?rev=235765=gcc=rev
Log:
PR rtl-optimization/70467
* cse.c (cse_insn): Handle no-op MEM moves after folding.

* gcc.target/i386/pr70467-1.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70467-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cse.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-05-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #10 from Jakub Jelinek  ---
Author: jakub
Date: Mon May  2 16:17:02 2016
New Revision: 235764

URL: https://gcc.gnu.org/viewcvs?rev=235764=gcc=rev
Log:
PR rtl-optimization/70467
* ipa-pure-const.c (check_call): Handle internal calls even in
ipa mode like in local mode.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-pure-const.c

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-04-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #9 from Jakub Jelinek  ---
Author: jakub
Date: Fri Apr  1 16:08:21 2016
New Revision: 234679

URL: https://gcc.gnu.org/viewcvs?rev=234679=gcc=rev
Log:
PR rtl-optimization/70467
* config/i386/i386.md (*add3_doubleword, *sub3_doubleword):
If low word of the last operand is 0, just emit addition/subtraction
for the high word.

* gcc.target/i386/pr70467-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70467-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Jakub Jelinek  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek  ---
As for the double word additions/subtractions with low bits 0, like:
unsigned long long
foo (unsigned long long x)
{
  return x + 0x123456ULL;
}

unsigned long long
bar (unsigned long long x)
{
  return x - 0x123456ULL;
}
for -m32 -O2 and
__uint128_t
foo (__uint128_t x)
{
  return x + ((__uint128_t) 123456 << 64);
}

__uint128_t
bar (__uint128_t x)
{
  return x - ((__uint128_t) 123456 << 64);
}
for -m64 -O2, I have a partial fix here:

--- gcc/config/i386/i386.md.jj  2016-03-29 19:31:23.0 +0200
+++ gcc/config/i386/i386.md 2016-03-31 17:33:36.848167239 +0200
@@ -5449,7 +5449,14 @@ (define_insn_and_split "*add3_doubl
   (match_dup 4))
 (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, [0], 3, [0],
[3]);")
+{
+  split_double_mode (mode, [0], 3, [0], [3]);
+  if (operands[2] == const0_rtx)
+{
+  ix86_expand_binary_operator (PLUS, mode, [3]);
+  DONE;
+}
+})

 (define_insn "*add_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm,r,r")
@@ -6379,7 +6386,14 @@ (define_insn_and_split "*sub3_doubl
   (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, [0], 3, [0],
[3]);")
+{
+  split_double_mode (mode, [0], 3, [0], [3]);
+  if (operands[2] == const0_rtx)
+{
+  ix86_expand_binary_operator (MINUS, mode, [3]);
+  DONE;
+}
+})

 (define_insn "*sub_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")

but it only works for the -m32 testcase.  The problem is that for "" for
the TImode addition/subtraction we use "e" constraint and that is obviously
inappropriate, we want some new constraints that makes sure that both the low
and high 64-bits of the constant are "e".  Will hack on that tomorrow.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #7 from Jakub Jelinek  ---
That is correct expectation, but the problem is that no pass that uses it
actually manages to update the insn.
As I said earlier, the combiner doesn't trigger, because there is just a single
insn, nothing to combine.
And CSE only triggers for & 0 and similar (which turns it into mem = 0), but
doesn't trigger for the case where we would want a no-op move.
There are two reasons for that.
One is PR35258, which wants to avoid overlapping sets, but doesn't consider
noop moves to be fine.  And then, most of the architectures actually don't
allow MEM to MEM moves, so even if we get through that, we don't validate the
changes.
The following untested patch fixes that:

--- gcc/cse.c.jj2016-02-17 11:40:19.0 +0100
+++ gcc/cse.c   2016-03-31 14:42:27.104824253 +0200
@@ -5166,7 +5166,7 @@ cse_insn (rtx_insn *insn)
}

  /* Avoid creation of overlapping memory moves.  */
- if (MEM_P (trial) && MEM_P (SET_DEST (sets[i].rtl)))
+ if (MEM_P (trial) && MEM_P (dest) && !rtx_equal_p (trial, dest))
{
  rtx src, dest;

@@ -5277,6 +5277,20 @@ cse_insn (rtx_insn *insn)
  break;
}

+ /* Similarly, lots of targets don't allow no-op
+(set (mem x) (mem x)) moves.  */
+ else if (n_sets == 1
+  && MEM_P (trial)
+  && MEM_P (dest)
+  && rtx_equal_p (trial, dest)
+  && !side_effects_p (dest)
+  && (cfun->can_delete_dead_exceptions
+  || insn_nothrow_p (insn)))
+   {
+ SET_SRC (sets[i].rtl) = trial;
+ break;
+   }
+
  /* Reject certain invalid forms of CONST that we create.  */
  else if (CONSTANT_P (trial)
   && GET_CODE (trial) == CONST
@@ -5495,6 +5509,21 @@ cse_insn (rtx_insn *insn)
  sets[i].rtl = 0;
}

+  /* Similarly for no-op MEM moves.  */
+  else if (n_sets == 1
+  && MEM_P (SET_DEST (sets[i].rtl))
+  && MEM_P (SET_SRC (sets[i].rtl))
+  && rtx_equal_p (SET_DEST (sets[i].rtl),
+  SET_SRC (sets[i].rtl))
+  && !side_effects_p (SET_DEST (sets[i].rtl))
+  && (cfun->can_delete_dead_exceptions || insn_nothrow_p (insn)))
+   {
+ if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
+   cse_cfg_altered = true;
+ delete_insn_and_edges (insn);
+ sets[i].rtl = 0;
+   }
+
   /* If this SET is now setting PC to a label, we know it used to
 be a conditional or computed branch.  */
   else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #6 from Richard Biener  ---
I would have expected simplify_rtx to eventually handle all interesting cases.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #5 from Jakub Jelinek  ---
For the logicals, e.g. the following works:
--- gcc/optabs.c.jj 2016-02-16 16:15:17.0 +0100
+++ gcc/optabs.c2016-03-31 12:53:37.571337401 +0200
@@ -1136,6 +1136,37 @@ expand_binop (machine_mode mode, optab b
   op1 = force_reg (GET_MODE_INNER (mode), op1);
 }

+  /* Optimize some bitwise operations; these can be not optimized
+ away by GIMPLE optimizations when expanding wider bitwise operations
+ a word at a time.  */
+
+  if (optimize)
+{
+  if (binoptab == and_optab)
+   {
+ if (op1 == CONST0_RTX (mode) && ! side_effects_p (op0))
+   return op1;
+ if (INTEGRAL_MODE_P (mode) && op1 == CONSTM1_RTX (mode))
+   return op0;
+   }
+  if (binoptab == ior_optab)
+   {
+ if (op1 == CONST0_RTX (mode))
+   return op0;
+ if (INTEGRAL_MODE_P (mode)
+ && op1 == CONSTM1_RTX (mode)
+ && ! side_effects_p (op0))
+   return op1;
+   }
+  if (binoptab == xor_optab)
+   {
+ if (op1 == CONST0_RTX (mode))
+   return op0;
+ if (INTEGRAL_MODE_P (mode) && op1 == CONSTM1_RTX (mode))
+   return expand_unop (mode, one_cmpl_optab, op0, target, 0);
+   }
+}
+
   /* Record where to delete back to if we backtrack.  */
   last = get_last_insn ();

Tested on
void foo (unsigned long long *);

void
bar (void)
{
  unsigned long long a;
  foo ();
  a &= 0x7fffULL;
  foo ();
  a &= 0x7fffULL;
  foo ();
  a &= 0x7fffULL;
  foo ();
  a &= 0x7fffULL;
  foo ();
  a &= 0xULL;
  foo ();
  a &= 0xULL;
  foo ();
  a |= 0x7fffULL;
  foo ();
  a |= 0x7fffULL;
  foo ();
  a |= 0x7fffULL;
  foo ();
  a |= 0x7fffULL;
  foo ();
  a |= 0xULL;
  foo ();
  a |= 0xULL;
  foo ();
  a ^= 0x7fffULL;
  foo ();
  a ^= 0x7fffULL;
  foo ();
  a ^= 0x7fffULL;
  foo ();
  a ^= 0x7fffULL;
  foo ();
  a ^= 0xULL;
  foo ();
  a ^= 0xULL;
  foo ();
}

Though, apparently CSE1 is able to optimize some of these (& 0, | -1, ^ -1),
but not the others.
So another option is to handle & -1, | 0 and ^ 0 in CSE.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread b7.10110111 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

--- Comment #4 from Ruslan  ---
(In reply to Jakub Jelinek from comment #3)
> ...
> nothing there is able to optimize & -1 (and similarly | or ^ 0, or & 0, or |
> -1).

Just a note: the same happens for arithmetic operations, not just bitwise. E.g.
if you change `v&=~(1ull<<63)` in the OP to `v+=1ull<<32`, GCC generates `add
dword [esp],0` followed by `adc dword [esp+4],1`.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Jakub Jelinek  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||jakub at gcc dot gnu.org
 Resolution|FIXED   |---

--- Comment #3 from Jakub Jelinek  ---
It is not fixed, try
-m32 -O2 -mno-sse.
I believe the issue is that we expand_binop DImode
op0 == target
(mem/c:DI (plus:SI (reg/f:SI 82 virtual-stack-vars)
(const_int -16 [0xfff0])) [0 MEM[(char *
{ref-all})]+0 S8 A128])
and op1
(const_int 9223372036854775807 [0x7fff])
which then expands this per subwords as two SImode expand_binops, but nothing
there is able to optimize & -1 (and similarly | or ^ 0, or & 0, or | -1).
And, combiner doesn't do anything, because it is a single insn that operates on
memory, so there is nothing to combine together.
So, IMHO expand_binop just should handle the easy cases.  I'll prepare a patch.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||6.0
 Resolution|--- |FIXED

--- Comment #2 from Richard Biener  ---
Fixed in GCC 6 btw:

doStuff:
.LFB15:
.cfi_startproc
subl$28, %esp
.cfi_def_cfa_offset 32
calltest
fstpt   (%esp)
andl$2147483647, 4(%esp)
fldt(%esp)
addl$28, %esp
.cfi_def_cfa_offset 4
ret

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-31
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
can't reproduce with sth simpler.  But confirmed, with plain -O2 -m32. 
Probably
to do with splitting wide types (or not doing that in this case).