[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Jakub Jelinek changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #15 from Jakub Jelinek --- Fixed.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #14 from Martin Liška --- Jakub: Can the bug be marked as resolved?
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment #13 from Yuri Rumyantsev --- The fix r235764 introduced regression described in PR71956.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #12 from Jakub Jelinek --- Author: jakub Date: Tue May 3 11:45:04 2016 New Revision: 235816 URL: https://gcc.gnu.org/viewcvs?rev=235816=gcc=rev Log: PR rtl-optimization/70467 * config/i386/predicates.md (x86_64_hilo_int_operand, x86_64_hilo_general_operand): New predicates. * config/i386/constraints.md (Wd): New constraint. * config/i386/i386.md (mode attr di): Use Wd instead of e. (general_hilo_operand): New mode attr. (add3, sub3): Use instead of . (*add3_doubleword, *sub3_doubleword): Use x86_64_hilo_general_operand instead of . * gcc.target/i386/pr70467-3.c: New test. * gcc.target/i386/pr70467-4.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr70467-3.c trunk/gcc/testsuite/gcc.target/i386/pr70467-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/constraints.md trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/predicates.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #11 from Jakub Jelinek --- Author: jakub Date: Mon May 2 16:46:10 2016 New Revision: 235765 URL: https://gcc.gnu.org/viewcvs?rev=235765=gcc=rev Log: PR rtl-optimization/70467 * cse.c (cse_insn): Handle no-op MEM moves after folding. * gcc.target/i386/pr70467-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr70467-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/cse.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #10 from Jakub Jelinek --- Author: jakub Date: Mon May 2 16:17:02 2016 New Revision: 235764 URL: https://gcc.gnu.org/viewcvs?rev=235764=gcc=rev Log: PR rtl-optimization/70467 * ipa-pure-const.c (check_call): Handle internal calls even in ipa mode like in local mode. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #9 from Jakub Jelinek --- Author: jakub Date: Fri Apr 1 16:08:21 2016 New Revision: 234679 URL: https://gcc.gnu.org/viewcvs?rev=234679=gcc=rev Log: PR rtl-optimization/70467 * config/i386/i386.md (*add3_doubleword, *sub3_doubleword): If low word of the last operand is 0, just emit addition/subtraction for the high word. * gcc.target/i386/pr70467-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr70467-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Jakub Jelinek changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #8 from Jakub Jelinek --- As for the double word additions/subtractions with low bits 0, like: unsigned long long foo (unsigned long long x) { return x + 0x123456ULL; } unsigned long long bar (unsigned long long x) { return x - 0x123456ULL; } for -m32 -O2 and __uint128_t foo (__uint128_t x) { return x + ((__uint128_t) 123456 << 64); } __uint128_t bar (__uint128_t x) { return x - ((__uint128_t) 123456 << 64); } for -m64 -O2, I have a partial fix here: --- gcc/config/i386/i386.md.jj 2016-03-29 19:31:23.0 +0200 +++ gcc/config/i386/i386.md 2016-03-31 17:33:36.848167239 +0200 @@ -5449,7 +5449,14 @@ (define_insn_and_split "*add3_doubl (match_dup 4)) (match_dup 5))) (clobber (reg:CC FLAGS_REG))])] - "split_double_mode (mode, [0], 3, [0], [3]);") +{ + split_double_mode (mode, [0], 3, [0], [3]); + if (operands[2] == const0_rtx) +{ + ix86_expand_binary_operator (PLUS, mode, [3]); + DONE; +} +}) (define_insn "*add_1" [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm,r,r") @@ -6379,7 +6386,14 @@ (define_insn_and_split "*sub3_doubl (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))) (match_dup 5))) (clobber (reg:CC FLAGS_REG))])] - "split_double_mode (mode, [0], 3, [0], [3]);") +{ + split_double_mode (mode, [0], 3, [0], [3]); + if (operands[2] == const0_rtx) +{ + ix86_expand_binary_operator (MINUS, mode, [3]); + DONE; +} +}) (define_insn "*sub_1" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,") but it only works for the -m32 testcase. The problem is that for "" for the TImode addition/subtraction we use "e" constraint and that is obviously inappropriate, we want some new constraints that makes sure that both the low and high 64-bits of the constant are "e". Will hack on that tomorrow.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #7 from Jakub Jelinek --- That is correct expectation, but the problem is that no pass that uses it actually manages to update the insn. As I said earlier, the combiner doesn't trigger, because there is just a single insn, nothing to combine. And CSE only triggers for & 0 and similar (which turns it into mem = 0), but doesn't trigger for the case where we would want a no-op move. There are two reasons for that. One is PR35258, which wants to avoid overlapping sets, but doesn't consider noop moves to be fine. And then, most of the architectures actually don't allow MEM to MEM moves, so even if we get through that, we don't validate the changes. The following untested patch fixes that: --- gcc/cse.c.jj2016-02-17 11:40:19.0 +0100 +++ gcc/cse.c 2016-03-31 14:42:27.104824253 +0200 @@ -5166,7 +5166,7 @@ cse_insn (rtx_insn *insn) } /* Avoid creation of overlapping memory moves. */ - if (MEM_P (trial) && MEM_P (SET_DEST (sets[i].rtl))) + if (MEM_P (trial) && MEM_P (dest) && !rtx_equal_p (trial, dest)) { rtx src, dest; @@ -5277,6 +5277,20 @@ cse_insn (rtx_insn *insn) break; } + /* Similarly, lots of targets don't allow no-op +(set (mem x) (mem x)) moves. */ + else if (n_sets == 1 + && MEM_P (trial) + && MEM_P (dest) + && rtx_equal_p (trial, dest) + && !side_effects_p (dest) + && (cfun->can_delete_dead_exceptions + || insn_nothrow_p (insn))) + { + SET_SRC (sets[i].rtl) = trial; + break; + } + /* Reject certain invalid forms of CONST that we create. */ else if (CONSTANT_P (trial) && GET_CODE (trial) == CONST @@ -5495,6 +5509,21 @@ cse_insn (rtx_insn *insn) sets[i].rtl = 0; } + /* Similarly for no-op MEM moves. */ + else if (n_sets == 1 + && MEM_P (SET_DEST (sets[i].rtl)) + && MEM_P (SET_SRC (sets[i].rtl)) + && rtx_equal_p (SET_DEST (sets[i].rtl), + SET_SRC (sets[i].rtl)) + && !side_effects_p (SET_DEST (sets[i].rtl)) + && (cfun->can_delete_dead_exceptions || insn_nothrow_p (insn))) + { + if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn)) + cse_cfg_altered = true; + delete_insn_and_edges (insn); + sets[i].rtl = 0; + } + /* If this SET is now setting PC to a label, we know it used to be a conditional or computed branch. */ else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #6 from Richard Biener --- I would have expected simplify_rtx to eventually handle all interesting cases.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #5 from Jakub Jelinek --- For the logicals, e.g. the following works: --- gcc/optabs.c.jj 2016-02-16 16:15:17.0 +0100 +++ gcc/optabs.c2016-03-31 12:53:37.571337401 +0200 @@ -1136,6 +1136,37 @@ expand_binop (machine_mode mode, optab b op1 = force_reg (GET_MODE_INNER (mode), op1); } + /* Optimize some bitwise operations; these can be not optimized + away by GIMPLE optimizations when expanding wider bitwise operations + a word at a time. */ + + if (optimize) +{ + if (binoptab == and_optab) + { + if (op1 == CONST0_RTX (mode) && ! side_effects_p (op0)) + return op1; + if (INTEGRAL_MODE_P (mode) && op1 == CONSTM1_RTX (mode)) + return op0; + } + if (binoptab == ior_optab) + { + if (op1 == CONST0_RTX (mode)) + return op0; + if (INTEGRAL_MODE_P (mode) + && op1 == CONSTM1_RTX (mode) + && ! side_effects_p (op0)) + return op1; + } + if (binoptab == xor_optab) + { + if (op1 == CONST0_RTX (mode)) + return op0; + if (INTEGRAL_MODE_P (mode) && op1 == CONSTM1_RTX (mode)) + return expand_unop (mode, one_cmpl_optab, op0, target, 0); + } +} + /* Record where to delete back to if we backtrack. */ last = get_last_insn (); Tested on void foo (unsigned long long *); void bar (void) { unsigned long long a; foo (); a &= 0x7fffULL; foo (); a &= 0x7fffULL; foo (); a &= 0x7fffULL; foo (); a &= 0x7fffULL; foo (); a &= 0xULL; foo (); a &= 0xULL; foo (); a |= 0x7fffULL; foo (); a |= 0x7fffULL; foo (); a |= 0x7fffULL; foo (); a |= 0x7fffULL; foo (); a |= 0xULL; foo (); a |= 0xULL; foo (); a ^= 0x7fffULL; foo (); a ^= 0x7fffULL; foo (); a ^= 0x7fffULL; foo (); a ^= 0x7fffULL; foo (); a ^= 0xULL; foo (); a ^= 0xULL; foo (); } Though, apparently CSE1 is able to optimize some of these (& 0, | -1, ^ -1), but not the others. So another option is to handle & -1, | 0 and ^ 0 in CSE.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 --- Comment #4 from Ruslan --- (In reply to Jakub Jelinek from comment #3) > ... > nothing there is able to optimize & -1 (and similarly | or ^ 0, or & 0, or | > -1). Just a note: the same happens for arithmetic operations, not just bitwise. E.g. if you change `v&=~(1ull<<63)` in the OP to `v+=1ull<<32`, GCC generates `add dword [esp],0` followed by `adc dword [esp+4],1`.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Jakub Jelinek changed: What|Removed |Added Status|RESOLVED|REOPENED CC||jakub at gcc dot gnu.org Resolution|FIXED |--- --- Comment #3 from Jakub Jelinek --- It is not fixed, try -m32 -O2 -mno-sse. I believe the issue is that we expand_binop DImode op0 == target (mem/c:DI (plus:SI (reg/f:SI 82 virtual-stack-vars) (const_int -16 [0xfff0])) [0 MEM[(char * {ref-all})]+0 S8 A128]) and op1 (const_int 9223372036854775807 [0x7fff]) which then expands this per subwords as two SImode expand_binops, but nothing there is able to optimize & -1 (and similarly | or ^ 0, or & 0, or | -1). And, combiner doesn't do anything, because it is a single insn that operates on memory, so there is nothing to combine together. So, IMHO expand_binop just should handle the easy cases. I'll prepare a patch.
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Known to work||6.0 Resolution|--- |FIXED --- Comment #2 from Richard Biener --- Fixed in GCC 6 btw: doStuff: .LFB15: .cfi_startproc subl$28, %esp .cfi_def_cfa_offset 32 calltest fstpt (%esp) andl$2147483647, 4(%esp) fldt(%esp) addl$28, %esp .cfi_def_cfa_offset 4 ret
[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2016-03-31 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- can't reproduce with sth simpler. But confirmed, with plain -O2 -m32. Probably to do with splitting wide types (or not doing that in this case).