[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Thu Nov 20 08:40:52 2014 New Revision: 217827 URL: https://gcc.gnu.org/viewcvs?rev=217827root=gccview=rev Log: 2014-11-20 Richard Biener rguent...@suse.de PR tree-optimization/63677 * tree-ssa-dom.c: Include gimplify.h for unshare_expr. (avail_exprs_stack): Make a vector of pairs. (struct hash_expr_elt): Replace stmt member with vop member. (expr_elt_hasher::equal): Simplify. (initialize_hash_element): Adjust. (initialize_hash_element_from_expr): Likewise. (dom_opt_dom_walker::thread_across_edge): Likewise. (record_cond): Likewise. (dom_opt_dom_walker::before_dom_children): Likewise. (print_expr_hash_elt): Likewise. (remove_local_expressions_from_table): Restore previous state if requested. (record_equivalences_from_stmt): Record x + CST as constant MEM[x, CST] for further propagation. (vuse_eq): New function. (lookup_avail_expr): For loads use the alias oracle to see whether a candidate from the expr hash is usable. (avail_expr_hash): Do not hash VUSEs. * gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase. * gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-dom.c
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Fixed for GCC 5.
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #8 from Richard Biener rguenth at gcc dot gnu.org --- *** Bug 63679 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org --- With the patch from PR63864 we still don't optimize: bb 2: vect_cst_.12_23 = { 0, 1, 2, 3 }; vect_cst_.11_32 = { 4, 5, 6, 7 }; vectp.14_2 = a[0]; MEM[(int *)a] = { 0, 1, 2, 3 }; vectp.14_21 = a[0] + 16; MEM[(int *)vectp.14_21] = { 4, 5, 6, 7 }; vectp_a.5_22 = a; vect__13.6_20 = MEM[(int *)a]; this is because while seeing the candidate MEM[(int *)a] = { 0, 1, 2, 3 }; for the load vect__13.6_20 = MEM[(int *)a]; we fail to disambiguate against the store to MEM[(int *)vectp.14_21] which is not simplified to MEM[a, 16] = { 4, 5, 6, 7 }; because DOM does not have the trick of representing invariant-ptr + CST as MEM[..., CST'] for propagation. If I fix that (huh, not sure why we don't simply fold the pointer-plus that way, now four places do that trick for propagation...) then it works: LKUP STMT vect__13.6_20 = MEM[(int *)a] vect__13.6_20 = MEM[(int *)a]; FIND: { 0, 1, 2, 3 } Replaced redundant expr 'MEM[(int *)a]' with '{ 0, 1, 2, 3 }' t.c.183t.optimized: foo () { bb 2: return 28; }
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #3) The problem is that the loop is first vectorized, then several passes later slp vectorizes the initialization, so after some cleanups we have e.g. in cddce2: MEM[(int *)a] = { 0, 1, 2, 3 }; MEM[(int *)a + 16B] = { 4, 5, 6, 7 }; vect__13.6_20 = MEM[(int *)a]; vect__13.6_17 = MEM[(int *)a + 16B]; But there is no further FRE pass that would optimize the loads into vect__13.6_20 = { 0, 1, 2, 3 }; vect__13.6_17 = { 4, 5, 6, 7 }; (supposedly that would need to be done before forwprop4 that could in theory refold all the stmts into constant). Richard, how expensive would be to schedule another FRE pass if anything has been vectorized in the current function (either vect pass, or slp)? Or are there other passes that handle this? Looking at e.g. typedef int V __attribute__((vector_size (4 * sizeof (int; struct S { int a[4]; }; V __attribute__ ((noinline)) foo (struct S *p) { *(V *) p = (V) { 1, 2, 3, 4 }; return *(V *) p; } with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately at dom2 time the values have not been sufficiently forward propagated for dom2 to optimize this. For the case in question there is only FRE that can handle CSEing of the MEM[(int *)a] load (DOM should habdle the laod of _17 fine). I'm not very fond of adding more passes, but in theory a FRE right after pass_tree_loop_done could do the trick. Though ideally you'd want it a bit later, after vector lowering - and after tracer (so where the current DOM sits and remove DOM). Of course FRE is more expensive than DOM and DOM might catch some jump threading opportunities (though VRP does that as well).
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #1 from Tejas Belagod belagod at gcc dot gnu.org --- There is similar behaviour on aarch64. So, it doesn't look like a backend issue.
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 --- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org --- I have seen this also.
[Bug tree-optimization/63677] Failure to constant fold with vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-10-29 CC||jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org --- The problem is that the loop is first vectorized, then several passes later slp vectorizes the initialization, so after some cleanups we have e.g. in cddce2: MEM[(int *)a] = { 0, 1, 2, 3 }; MEM[(int *)a + 16B] = { 4, 5, 6, 7 }; vect__13.6_20 = MEM[(int *)a]; vect__13.6_17 = MEM[(int *)a + 16B]; But there is no further FRE pass that would optimize the loads into vect__13.6_20 = { 0, 1, 2, 3 }; vect__13.6_17 = { 4, 5, 6, 7 }; (supposedly that would need to be done before forwprop4 that could in theory refold all the stmts into constant). Richard, how expensive would be to schedule another FRE pass if anything has been vectorized in the current function (either vect pass, or slp)? Or are there other passes that handle this? Looking at e.g. typedef int V __attribute__((vector_size (4 * sizeof (int; struct S { int a[4]; }; V __attribute__ ((noinline)) foo (struct S *p) { *(V *) p = (V) { 1, 2, 3, 4 }; return *(V *) p; } with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately at dom2 time the values have not been sufficiently forward propagated for dom2 to optimize this.