[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 --- Comment #8 from Aldy Hernandez --- Author: aldyh Date: Wed Sep 13 15:55:56 2017 New Revision: 252135 URL: https://gcc.gnu.org/viewcvs?rev=252135&root=gcc&view=rev Log: 2017-07-28 Richard Biener PR tree-optimization/81502 * match.pd: Add pattern combining BIT_INSERT_EXPR with BIT_FIELD_REF. * tree-cfg.c (verify_expr): Verify types of BIT_FIELD_REF size/pos operands. (verify_gimple_assign_ternary): Likewise for BIT_INSERT_EXPR pos. * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Use bitsizetype for BIT_FIELD_REF args. * fold-const.c (make_bit_field_ref): Likewise. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. * gcc.target/i386/pr81502.c: New testcase. Added: branches/range-gen2/gcc/testsuite/gcc.target/i386/pr81502.c Modified: branches/range-gen2/gcc/ChangeLog branches/range-gen2/gcc/fold-const.c branches/range-gen2/gcc/gimple-fold.c branches/range-gen2/gcc/match.pd branches/range-gen2/gcc/testsuite/ChangeLog branches/range-gen2/gcc/tree-cfg.c branches/range-gen2/gcc/tree-vect-stmts.c
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 --- Comment #7 from Aldy Hernandez --- Author: aldyh Date: Wed Sep 13 15:51:46 2017 New Revision: 252114 URL: https://gcc.gnu.org/viewcvs?rev=252114&root=gcc&view=rev Log: 2017-07-27 Richard Biener PR tree-optimization/81502 * tree-ssa.c (non_rewritable_lvalue_p): Handle BIT_INSERT_EXPR with incompatible but same sized type. (execute_update_addresses_taken): Likewise. * gcc.target/i386/vect-insert-1.c: New testcase. Added: branches/range-gen2/gcc/testsuite/gcc.target/i386/vect-insert-1.c Modified: branches/range-gen2/gcc/ChangeLog branches/range-gen2/gcc/testsuite/ChangeLog branches/range-gen2/gcc/tree-ssa.c
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||8.0 Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed. Thanks for reporting!
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 --- Comment #5 from Richard Biener --- Author: rguenth Date: Fri Jul 28 11:27:45 2017 New Revision: 250659 URL: https://gcc.gnu.org/viewcvs?rev=250659&root=gcc&view=rev Log: 2017-07-28 Richard Biener PR tree-optimization/81502 * match.pd: Add pattern combining BIT_INSERT_EXPR with BIT_FIELD_REF. * tree-cfg.c (verify_expr): Verify types of BIT_FIELD_REF size/pos operands. (verify_gimple_assign_ternary): Likewise for BIT_INSERT_EXPR pos. * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Use bitsizetype for BIT_FIELD_REF args. * fold-const.c (make_bit_field_ref): Likewise. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. * gcc.target/i386/pr81502.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/i386/pr81502.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/gimple-fold.c trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-cfg.c trunk/gcc/tree-vect-stmts.c
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #4 from Richard Biener --- So the rewrite into SSA should work now. I'm not sure the pattern combining BIT_INSERT_EXPR and VECTOR_CST into a CONSTRUCTOR is desirable, a single insert into a loaded constant vector is likely cheaper so some cost modeling would be in order. So from int bar(void*) (void * ptr) { int res; __m128i word; long long int _2; unsigned int _4; [100.00%] [count: INV]: _2 = (long long int) ptr_6(D); word_3 = BIT_INSERT_EXPR <{ 0, 0 }, _2, 0 (64 bits)>; _4 = BIT_FIELD_REF ; res_5 = (int) _4; return res_5; the desired pattern is combining BIT_FIELD_REF and BIT_INSERT_EXPR. We'd combine that into _4 = BIT_FIELD_REF <_2, 32, 0>; will try to come up with a match.pd rule once time permits.
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 --- Comment #3 from Richard Biener --- Author: rguenth Date: Thu Jul 27 12:01:21 2017 New Revision: 250620 URL: https://gcc.gnu.org/viewcvs?rev=250620&root=gcc&view=rev Log: 2017-07-27 Richard Biener PR tree-optimization/81502 * tree-ssa.c (non_rewritable_lvalue_p): Handle BIT_INSERT_EXPR with incompatible but same sized type. (execute_update_addresses_taken): Likewise. * gcc.target/i386/vect-insert-1.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/i386/vect-insert-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa.c
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 --- Comment #2 from Richard Biener --- Note that with -mtune=intel we already get _Z3barPv: .LFB526: .cfi_startproc movq%rdi, %xmm0 movd%xmm0, %eax ret but yes, the intermediate temporary is unnecessary. We do not optimize this to a BIT_INSERT_EXPR because the vector element type doesn't match the insertion quantity. We can relax that a bit with the following: Index: gcc/tree-ssa.c === --- gcc/tree-ssa.c (revision 250386) +++ gcc/tree-ssa.c (working copy) @@ -1513,8 +1513,8 @@ non_rewritable_lvalue_p (tree lhs) if (DECL_P (decl) && VECTOR_TYPE_P (TREE_TYPE (decl)) && TYPE_MODE (TREE_TYPE (decl)) != BLKmode - && types_compatible_p (TREE_TYPE (lhs), -TREE_TYPE (TREE_TYPE (decl))) + && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), + TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0) && tree_fits_uhwi_p (TREE_OPERAND (lhs, 1)) && tree_int_cst_lt (TREE_OPERAND (lhs, 1), TYPE_SIZE_UNIT (TREE_TYPE (decl))) @@ -1839,8 +1839,9 @@ execute_update_addresses_taken (void) && bitmap_bit_p (suitable_for_renaming, DECL_UID (sym)) && VECTOR_TYPE_P (TREE_TYPE (sym)) && TYPE_MODE (TREE_TYPE (sym)) != BLKmode - && types_compatible_p (TREE_TYPE (lhs), - TREE_TYPE (TREE_TYPE (sym))) + && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), + TYPE_SIZE_UNIT + (TREE_TYPE (TREE_TYPE (sym))), 0) && tree_fits_uhwi_p (TREE_OPERAND (lhs, 1)) && tree_int_cst_lt (TREE_OPERAND (lhs, 1), TYPE_SIZE_UNIT (TREE_TYPE (sym))) @@ -1848,6 +1849,18 @@ execute_update_addresses_taken (void) % tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (lhs == 0) { tree val = gimple_assign_rhs1 (stmt); + if (! types_compatible_p (TREE_TYPE (lhs), + TREE_TYPE (TREE_TYPE (sym + { + tree tem = make_ssa_name (TREE_TYPE (TREE_TYPE (sym))); + gimple *pun + = gimple_build_assign (tem, +build1 (VIEW_CONVERT_EXPR, +TREE_TYPE (TREE_TYPE + (sym)), val)); + gsi_insert_before (&gsi, pun, GSI_SAME_STMT); + val = tem; + } tree bitpos = wide_int_to_tree (bitsizetype, mem_ref_offset (lhs) * BITS_PER_UNIT); this gets us to int bar(void*) (void * ptr) { int res; __m128i word; long long int _2; unsigned int _4; [100.00%] [count: INV]: _2 = (long long int) ptr_6(D); word_3 = BIT_INSERT_EXPR <{ 0, 0 }, _2, 0 (64 bits)>; _4 = BIT_FIELD_REF ; res_5 = (int) _4; return res_5; in .optimized which shows (already known) missed foldings for bit-field-ref of bit-insert. That's a complicated one btw, extracting a component from a vector insert. Oh, and it misses bit-insert -> CONSTRUCTOR, thus word_3 = { _2, 0 }; (simplify (bit_insert VECTOR_CST@0 @1 @2) { vec *v; vec_alloc (v, TYPE_VECTOR_SUBPARTS (type)); for (unsigned i = 0; i < VECTOR_CST_NELTS (@0); ++i) { constructor_elt elt = { NULL_TREE, VECTOR_CST_ELT (@0, i) }; v->quick_push (elt); } (*v)[TREE_INT_CST_LOW (@2) / TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (type)))].value = @1; build_constructor (type, v); }) that gets us to [100.00%] [count: INV]: _2 = (long long int) ptr_6(D); word_3 = {_2, 0}; _4 = BIT_FIELD_REF ; res_5 = (int) _4; return res_5; where we still need that BIT_FIELD_REF simplification. The IL is already in this form when we run into FRE1 so handling it there should be possible in principle. Or we can fold word_3 = {_2, 0}; _4 = BIT_FIELD_REF ; to _4 = BIT_FIELD_REF <_2, 32, 0 [+adjustment]>; thus a BIT_FIELD_REF on a CONSTRUCTOR to one on the element.
[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502 Marc Glisse changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-21 Ever confirmed|0 |1 --- Comment #1 from Marc Glisse --- .optimized dump: int bar(void*) (void * ptr) { int res; __m128i word; long unsigned int _2; vector(2) long long int word.3_3; unsigned int _4; [100.00%] [count: INV]: _2 = (long unsigned int) ptr_9(D); word = { 0, 0 }; MEM[(char * {ref-all})&word] = _2; word.3_3 = word; word ={v} {CLOBBER}; _4 = BIT_FIELD_REF ; res_5 = (int) _4; return res_5; } We missed turning the memory write into a BIT_INSERT_EXPR, and passes like PRE missed following the bit_field_expr all the way to _2. .combine dump: [...] (insn 8 3 10 2 (set (reg/v:V2DI 90 [ word ]) (vec_concat:V2DI (reg/v/f:DI 92 [ ptr ]) (const_int 0 [0]))) "b.c":16 3712 {vec_concatv2di} (expr_list:REG_DEAD (reg/v/f:DI 92 [ ptr ]) (nil))) (insn 10 8 15 2 (set (reg:SI 94 [ res ]) (vec_select:SI (subreg:V4SI (reg/v:V2DI 90 [ word ]) 0) (parallel [ (const_int 0 [0]) ]))) "b.c":20 3697 {*vec_extractv4si_0} (expr_list:REG_DEAD (reg/v:V2DI 90 [ word ]) (nil))) [...] combine tries (set (reg:SI 94 [ res ]) (vec_select:SI (subreg:V4SI (vec_concat:V2DI (reg/v/f:DI 92 [ ptr ]) (const_int 0 [0])) 0) (parallel [ (const_int 0 [0]) ]))) which we fail to simplify. The xmm1-xmm0 mov is not considered a mov by the compiler but concatenation with 0, so not a RA problem. The change of mode (64-bit pointer to 32-bit int) seems to play a big role in confusing things here.