[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Thu Nov 20 08:40:52 2014
New Revision: 217827

URL: https://gcc.gnu.org/viewcvs?rev=217827root=gccview=rev
Log:
2014-11-20   Richard Biener  rguent...@suse.de

PR tree-optimization/63677
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
(expr_elt_hasher::equal): Simplify.
(initialize_hash_element): Adjust.
(initialize_hash_element_from_expr): Likewise.
(dom_opt_dom_walker::thread_across_edge): Likewise.
(record_cond): Likewise.
(dom_opt_dom_walker::before_dom_children): Likewise.
(print_expr_hash_elt): Likewise.
(remove_local_expressions_from_table): Restore previous state
if requested.
(record_equivalences_from_stmt): Record x + CST as constant
MEM[x, CST] for further propagation.
(vuse_eq): New function.
(lookup_avail_expr): For loads use the alias oracle to see
whether a candidate from the expr hash is usable.
(avail_expr_hash): Do not hash VUSEs.

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-dom.c


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Fixed for GCC 5.


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org ---
*** Bug 63679 has been marked as a duplicate of this bug. ***


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-11-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org ---
With the patch from PR63864 we still don't optimize:

  bb 2:
  vect_cst_.12_23 = { 0, 1, 2, 3 };
  vect_cst_.11_32 = { 4, 5, 6, 7 };
  vectp.14_2 = a[0];
  MEM[(int *)a] = { 0, 1, 2, 3 };
  vectp.14_21 = a[0] + 16;
  MEM[(int *)vectp.14_21] = { 4, 5, 6, 7 };
  vectp_a.5_22 = a;
  vect__13.6_20 = MEM[(int *)a];

this is because while seeing the candidate MEM[(int *)a] = { 0, 1, 2, 3 };
for the load vect__13.6_20 = MEM[(int *)a]; we fail to disambiguate
against the store to MEM[(int *)vectp.14_21] which is not simplified
to MEM[a, 16] = { 4, 5, 6, 7 }; because DOM does not have the trick
of representing invariant-ptr + CST as MEM[..., CST'] for propagation.

If I fix that (huh, not sure why we don't simply fold the pointer-plus that
way,
now four places do that trick for propagation...) then it works:

LKUP STMT vect__13.6_20 = MEM[(int *)a]
  vect__13.6_20 = MEM[(int *)a];
FIND: { 0, 1, 2, 3 }
  Replaced redundant expr 'MEM[(int *)a]' with '{ 0, 1, 2, 3 }'

t.c.183t.optimized:

foo ()
{
  bb 2:
  return 28;

}


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-10-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #3)
 The problem is that the loop is first vectorized, then several passes later
 slp vectorizes the initialization, so after some cleanups we have e.g. in
 cddce2:
   MEM[(int *)a] = { 0, 1, 2, 3 };
   MEM[(int *)a + 16B] = { 4, 5, 6, 7 };
   vect__13.6_20 = MEM[(int *)a];
   vect__13.6_17 = MEM[(int *)a + 16B];
 But there is no further FRE pass that would optimize the loads into
   vect__13.6_20 = { 0, 1, 2, 3 };
   vect__13.6_17 = { 4, 5, 6, 7 };
 (supposedly that would need to be done before forwprop4 that could in theory
 refold all the stmts into constant).
 
 Richard, how expensive would be to schedule another FRE pass if anything has
 been vectorized in the current function (either vect pass, or slp)?  Or are
 there other passes that handle this?  Looking at e.g.
 typedef int V __attribute__((vector_size (4 * sizeof (int;
 struct S { int a[4]; };
 V __attribute__ ((noinline)) foo (struct S *p)
 {
   *(V *) p = (V) { 1, 2, 3, 4 };
   return *(V *) p;
 }
 with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately
 at dom2 time the values have not been sufficiently forward propagated for
 dom2 to optimize this.

For the case in question there is only FRE that can handle CSEing of
the MEM[(int *)a] load (DOM should habdle the laod of _17 fine).
I'm not very fond of adding more passes, but in theory a FRE right
after pass_tree_loop_done could do the trick.  Though ideally you'd
want it a bit later, after vector lowering - and after tracer
(so where the current DOM sits and remove DOM).  Of course FRE is
more expensive than DOM and DOM might catch some jump threading
opportunities (though VRP does that as well).


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-10-29 Thread belagod at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #1 from Tejas Belagod belagod at gcc dot gnu.org ---
There is similar behaviour on aarch64. So, it doesn't look like a backend
issue.


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-10-29 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org ---
I have seen this also.


[Bug tree-optimization/63677] Failure to constant fold with vectorization.

2014-10-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-10-29
 CC||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org ---
The problem is that the loop is first vectorized, then several passes later slp
vectorizes the initialization, so after some cleanups we have e.g. in cddce2:
  MEM[(int *)a] = { 0, 1, 2, 3 };
  MEM[(int *)a + 16B] = { 4, 5, 6, 7 };
  vect__13.6_20 = MEM[(int *)a];
  vect__13.6_17 = MEM[(int *)a + 16B];
But there is no further FRE pass that would optimize the loads into
  vect__13.6_20 = { 0, 1, 2, 3 };
  vect__13.6_17 = { 4, 5, 6, 7 };
(supposedly that would need to be done before forwprop4 that could in theory
refold all the stmts into constant).

Richard, how expensive would be to schedule another FRE pass if anything has
been vectorized in the current function (either vect pass, or slp)?  Or are
there other passes that handle this?  Looking at e.g.
typedef int V __attribute__((vector_size (4 * sizeof (int;
struct S { int a[4]; };
V __attribute__ ((noinline)) foo (struct S *p)
{
  *(V *) p = (V) { 1, 2, 3, 4 };
  return *(V *) p;
}
with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately at
dom2 time the values have not been sufficiently forward propagated for dom2 to
optimize this.