On Wed, 28 Oct 2020 at 11:27, Christophe Lyon <christophe.l...@linaro.org> wrote: > > On Tue, 27 Oct 2020 at 13:18, Richard Biener <rguent...@suse.de> wrote: > > > > This makes SLP discovery detect backedges by seeding the bst_map with > > the node to be analyzed so it can be picked up from recursive calls. > > This removes the need to discover backedges in a separate walk. > > > > This enables SLP build to handle PHI nodes in full, continuing > > the SLP build to non-backedges. For loop vectorization this > > enables outer loop vectorization of nested SLP cycles and for > > BB vectorization this enables vectorization of PHIs at CFG merges. > > > > It also turns code generation into a SCC discovery walk to handle > > irreducible regions and nodes only reachable via backedges where > > we now also fill in vectorized backedge defs. > > > > This requires sanitizing the SLP tree for SLP reduction chains even > > more, manually filling the backedge SLP def. > > > > This also exposes the fact that CFG copying (and edge splitting > > until I fixed that) ends up with different edge order in the > > copy which doesn't play well with the desired 1:1 mapping of > > SLP PHI node children and edges for epilogue vectorization. > > I've tried to fixup CFG copying here but this really looks > > like a dead (or expensive) end there so I've done fixup in > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases > > we can run into. > > > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm > > not sure it's possible to eliminate them all this stage1 so the > > patch has quite some checks for this case all over the place. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 > > and SPEC CPU 2006 successfully built and tested. > > > > Will push soon. > > > > Richard. > > > > 2020-10-27 Richard Biener <rguent...@suse.de> > > > > * gimple.h (gimple_expr_type): For PHIs return the type > > of the result. > > * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): > > Make sure edge order into copied loop headers line up with the > > originals. > > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested > > loops with SLP. > > (vectorizable_phi): New function. > > (vectorizable_live_operation): For BB vectorization compute insert > > location here. > > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL > > SLP_TREE_CHILDREN entries. > > (vect_create_new_slp_node): Add overloads with pre-existing node > > argument. > > (vect_print_slp_graph): Likewise. > > (vect_mark_slp_stmts): Likewise. > > (vect_mark_slp_stmts_relevant): Likewise. > > (vect_gather_slp_loads): Likewise. > > (vect_optimize_slp): Likewise. > > (vect_slp_analyze_node_operations): Likewise. > > (vect_bb_slp_scalar_cost): Likewise. > > (vect_remove_slp_scalar_calls): Likewise. > > (vect_get_and_check_slp_defs): Handle PHIs. > > (vect_build_slp_tree_1): Handle PHIs. > > (vect_build_slp_tree_2): Continue SLP build, following PHI > > arguments. Fix memory leak. > > (vect_build_slp_tree): Put stub node into the hash-map so > > we can discover cycles directly. > > (vect_build_slp_instance): Set the backedge SLP def for > > reduction chains. > > (vect_analyze_slp_backedges): Remove. > > (vect_analyze_slp): Do not call it. > > (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION. > > (vect_slp_analyze_node_operations): Handle stray failed > > backedge defs by failing. > > (vect_slp_build_vertices): Adjust leaf condition. > > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited > > hash-set to handle cycles. > > (vect_slp_analyze_operations): Adjust. > > (vect_bb_partition_graph_r): Likewise. > > (vect_slp_function): Adjust split condition to allow CFG > > merges. > > (vect_schedule_slp_instance): Rename to ... > > (vect_schedule_slp_node): ... this. Move DFS walk to ... > > (vect_schedule_scc): ... this new function. > > (vect_schedule_slp): Call it. Remove ad-hoc vectorized > > backedge fill code. > > * tree-vect-stmts.c (vect_analyze_stmt): Call > > vectorizable_phi. > > (vect_transform_stmt): Likewise. > > (vect_is_simple_use): Handle vect_backedge_def. > > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only > > set loop header PHIs to vect_unknown_def_type for loop > > vectorization. > > * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. > > (enum stmt_vec_info_type): Add phi_info_type. > > (vectorizable_phi): Declare. > > > > * gcc.dg/vect/bb-slp-54.c: New test. > > * gcc.dg/vect/bb-slp-55.c: Likewise. > > * gcc.dg/vect/bb-slp-56.c: Likewise. > > * gcc.dg/vect/bb-slp-57.c: Likewise. > > * gcc.dg/vect/bb-slp-58.c: Likewise. > > * gcc.dg/vect/bb-slp-59.c: Likewise. > > * gcc.dg/vect/bb-slp-60.c: Likewise. > > * gcc.dg/vect/bb-slp-61.c: Likewise. > > * gcc.dg/vect/bb-slp-62.c: Likewise. > > * gcc.dg/vect/bb-slp-63.c: Likewise. > > * gcc.dg/vect/bb-slp-64.c: Likewise. > > * gcc.dg/vect/bb-slp-65.c: Likewise. > > * gcc.dg/vect/bb-slp-66.c: Likewise. > > * gcc.dg/vect/vect-outer-slp-1.c: Likewise. > > * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. > > * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. > > * g++.dg/vect/simd-11.cc: Likewise. > > > I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97616 > because bb-slp-58.c and bb-slp-59.c fail on arm. >
And it causes a regression on aarch64: FAIL: gcc.target/aarch64/sve/reduc_strict_4.c -march=armv8.2-a+sve scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d 4 FAIL: gcc.target/aarch64/sve/reduc_strict_5.c -march=armv8.2-a+sve scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d 6 > > --- > > gcc/gimple.h | 2 + > > gcc/testsuite/g++.dg/vect/simd-11.cc | 61 ++ > > gcc/testsuite/gcc.dg/vect/bb-slp-54.c | 23 + > > gcc/testsuite/gcc.dg/vect/bb-slp-55.c | 18 + > > gcc/testsuite/gcc.dg/vect/bb-slp-56.c | 17 + > > gcc/testsuite/gcc.dg/vect/bb-slp-57.c | 38 ++ > > gcc/testsuite/gcc.dg/vect/bb-slp-58.c | 23 + > > gcc/testsuite/gcc.dg/vect/bb-slp-59.c | 25 + > > gcc/testsuite/gcc.dg/vect/bb-slp-60.c | 18 + > > gcc/testsuite/gcc.dg/vect/bb-slp-61.c | 26 + > > gcc/testsuite/gcc.dg/vect/bb-slp-62.c | 21 + > > gcc/testsuite/gcc.dg/vect/bb-slp-63.c | 21 + > > gcc/testsuite/gcc.dg/vect/bb-slp-64.c | 11 + > > gcc/testsuite/gcc.dg/vect/bb-slp-65.c | 15 + > > gcc/testsuite/gcc.dg/vect/bb-slp-66.c | 32 + > > gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c | 31 + > > gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f | 28 + > > gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f | 40 ++ > > gcc/tree-vect-loop-manip.c | 27 + > > gcc/tree-vect-loop.c | 112 +++- > > gcc/tree-vect-slp.c | 664 ++++++++++++------- > > gcc/tree-vect-stmts.c | 8 +- > > gcc/tree-vectorizer.c | 3 +- > > gcc/tree-vectorizer.h | 2 + > > 24 files changed, 1031 insertions(+), 235 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/vect/simd-11.cc > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-54.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-55.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-56.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-57.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-58.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-59.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-60.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-61.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-62.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-63.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-64.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-65.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-66.c > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c > > create mode 100644 gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f > > create mode 100644 gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f > > > > diff --git a/gcc/gimple.h b/gcc/gimple.h > > index 3c9b9965f5a..87c90be9a6a 100644 > > --- a/gcc/gimple.h > > +++ b/gcc/gimple.h > > @@ -6598,6 +6598,8 @@ gimple_expr_type (const gimple *stmt) > > } > > else if (code == GIMPLE_COND) > > return boolean_type_node; > > + else if (code == GIMPLE_PHI) > > + return TREE_TYPE (gimple_phi_result (stmt)); > > else > > return void_type_node; > > } > > diff --git a/gcc/testsuite/g++.dg/vect/simd-11.cc > > b/gcc/testsuite/g++.dg/vect/simd-11.cc > > new file mode 100644 > > index 00000000000..912d1840055 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/simd-11.cc > > @@ -0,0 +1,61 @@ > > +// { dg-do compile } > > +// { dg-require-effective-target c++11 } > > +// { dg-additional-options "-Ofast" } > > + > > +template <typename> class h { > > +public: > > + ~h(); > > +}; > > +template <typename> struct l; > > +template <typename c> struct l<h<c> > { > > + using d = c; > > + template <typename e> using f = h<e>; > > +}; > > +template <typename g> struct o { > > + typedef l<g> j; > > + typedef typename j::d k; > > + template <typename c> struct p { typedef typename j::f<c> m; }; > > +}; > > +template <typename c, typename g> struct F { > > + typedef typename o<g>::p<c>::m q; > > + struct : q { > > + } r; > > +}; > > +template <typename c, typename g = h<c> > class s : F<c, g> { > > +public: > > + s(long); > > + typename o<typename F<c, g>::q>::k operator[](long); > > +}; > > +template <int> class t { > > +public: > > + int dimension; > > + t(const t &); > > + void operator+=(t); > > + double values[]; > > +}; > > +template <int dim> t<dim>::t(const t &p1) { > > + for (int i = 0; i < dim; ++i) > > + values[i] = p1.values[i]; > > +} > > +template <int dim> class u : public t<dim> { > > +public: > > + double m_fn1(const u &) const; > > +}; > > +template <int dim> double u<dim>::m_fn1(const u &) const { > > + double a; > > + for (int i = 0; i < dim; ++i) { > > + double b = this->values[i]; > > + a += b; > > + } > > + return a; > > +} > > +int m_fn2(const u<2> &p1, const u<2> &w) { > > + int n; > > + s<u<2> > c(n); > > + s<double> d(n); > > + double e = p1.m_fn1(w); > > + for (;;) { > > + c[0] += p1; > > + d = e; > > + } > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-54.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-54.c > > new file mode 100644 > > index 00000000000..d05ce33310d > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-54.c > > @@ -0,0 +1,23 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_double } */ > > + > > +double a[2], b[2], c[2]; > > + > > +void foo(int flag) > > +{ > > + double tem1, tem2; > > + if (flag) > > + { > > + tem1 = a[0]; > > + tem2 = a[1]; > > + } > > + else > > + { > > + tem1 = b[0]; > > + tem2 = b[1]; > > + } > > + c[0] = tem1; > > + c[1] = tem2; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "transform load" 2 "slp2" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-55.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-55.c > > new file mode 100644 > > index 00000000000..57a042b0ba5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-55.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > + > > +typedef struct { > > + int a; > > + int b; > > + int c; > > + int d; > > +} e; > > +e *f; > > +int g; > > +void h() { > > + e *i; > > + if (g) { > > + i->c = f[g].b; > > + i->d = f[g].a; > > + } else > > + i->c = i->d = 0; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-56.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-56.c > > new file mode 100644 > > index 00000000000..90d17512492 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-56.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > + > > +typedef struct { > > + double a, b; > > +} c; > > +int d, e; > > +int i(void); > > +void h(c, c); > > +void f() { > > + c a, g; > > + do { > > + a.a = e ?: g.a; > > + a.b = g.b + d; > > + h(g, a); > > + g = a; > > + } while (i()); > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-57.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-57.c > > new file mode 100644 > > index 00000000000..6f13507fd67 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-57.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-ffast-math" } */ > > +/* { dg-require-effective-target vect_float } */ > > + > > +float *a; > > +typedef struct { > > + int c; > > + float bbmax[3]; > > +} d; > > +d e; > > +int f[3]; > > +int g, h, i, j; > > +float k, k; > > +void l() > > +{ > > + for (unsigned z = 0; z < 2048; ++z) { > > + { > > + j = e.bbmax[1] > k ? e.bbmax[1] : k; > > + } > > + e.bbmax[1] = j; > > + { i = e.bbmax[2] > k ? e.bbmax[2] : k; } > > + e.bbmax[2] = i; > > + f[2] = a[2]; > > + { > > + float b; > > + h = e.bbmax[1] > b ? e.bbmax[1] : b; > > + } > > + e.bbmax[1] = h; > > + { > > + float b; > > + g = e.bbmax[2] > b ? e.bbmax[2] : b; > > + } > > + e.bbmax[2] = g; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-times "transform load" 1 "slp1" { target { > > { x86_64-*-* i?86-*-* } && lp64 } } } } */ > > +/* { dg-final { scan-tree-dump "optimized: basic block" "slp1" { target { > > { x86_64-*-* i?86-*-* } && lp64 } } } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-58.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-58.c > > new file mode 100644 > > index 00000000000..11bf5c333ab > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-58.c > > @@ -0,0 +1,23 @@ > > +/* { dg-do compile } */ > > + > > +double x[1024]; > > +void bar (void); > > + > > +void foo (void) > > +{ > > + double tem1 = x[0]; > > + double tem2 = x[1]; > > + for (int i = 0; i < 511; ++i) > > + { > > + x[2*i] = tem1; > > + x[2*i+1] = tem2; > > + bar (); > > + tem1 = x[2*(i+1)]; > > + tem2 = x[2*(i+1)+1]; > > + } > > +} > > + > > +/* We should be able to vectorize the cycle in one SLP attempt including > > + both load groups. */ > > +/* { dg-final { scan-tree-dump-times "transform load" 2 "slp1" } } */ > > +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } > > */ > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c > > new file mode 100644 > > index 00000000000..2e35725ff2a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c > > @@ -0,0 +1,25 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-fdump-tree-loopdone" } */ > > + > > +double x[1024]; > > +void bar (void); > > + > > +void foo (void) > > +{ > > + double tem1 = x[0]; > > + double tem2 = x[1]; > > + for (int i = 0; i < 511; ++i) > > + { > > + x[2*i] = tem2; > > + x[2*i+1] = tem1; > > + bar (); > > + tem1 = x[2*(i+1)]; > > + tem2 = x[2*(i+1)+1]; > > + } > > +} > > + > > +/* We should be able to vectorize the cycle in one SLP attempt including > > + both load groups and do only one permutation. */ > > +/* { dg-final { scan-tree-dump-times "transform load" 2 "slp1" } } */ > > +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone" } } */ > > +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } > > */ > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-60.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-60.c > > new file mode 100644 > > index 00000000000..52643bfd75a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-60.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > + > > +enum { a = 1, b }; > > +float *c, *e; > > +float d, h; > > +int f, g; > > +void i() > > +{ > > + float j = h; > > + for (; g;) > > + for (; f; f++) > > + { > > + c[a] = j * d; > > + c[b] = h * d; > > + j = 0; > > + h = e[2]; > > + } > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-61.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-61.c > > new file mode 100644 > > index 00000000000..3323a2bb2d9 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-61.c > > @@ -0,0 +1,26 @@ > > +/* { dg-do compile } */ > > + > > +struct a { > > + enum { b, c } d; > > + unsigned e; > > + unsigned f; > > +}; > > +void j(struct a *a, int i, int h) > > +{ > > + unsigned f = a->f; > > + switch (a->d) > > + while (1) > > + { > > + if (i) > > + { > > + case b: > > + if (h) > > + goto k; > > + } > > + else > > + f = 0; > > + case c:; > > + } > > +k: > > + a->e = a->f = f; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-62.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-62.c > > new file mode 100644 > > index 00000000000..84ee04c1013 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-62.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > + > > +typedef struct { > > + char a; > > + int b[]; > > +} c; > > +int d, f; > > +c e; > > +void g() { > > + int h, i, j; > > + for (; i;) > > + switch (i) > > + case 4: { > > + h = (__UINTPTR_TYPE__)g >= 3; > > + for (; h; h -= 1) > > + if (d) > > + j = f; > > + } > > + for (; i < 3; i++) > > + e.b[i] = j; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-63.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-63.c > > new file mode 100644 > > index 00000000000..6519c9752fe > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-63.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > + > > +struct { > > + unsigned a; > > + unsigned c; > > +} d; > > +int e, g; > > +void h(unsigned b) { > > + unsigned a, c; > > + while (e) { > > + if (b) { > > + ++e; > > + continue; > > + } > > + c = g; > > + if (g) > > + a |= 10; > > + } > > + d.a = a; > > + d.c = c; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-64.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-64.c > > new file mode 100644 > > index 00000000000..dcb6a1455c3 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-64.c > > @@ -0,0 +1,11 @@ > > +/* { dg-do compile } */ > > + > > +enum { a, b }; > > +double *c, *e; > > +int d, f; > > +void g() { > > + for (;;) { > > + c[a] = c[b] = d * e[b]; > > + f = d -= f; > > + } > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-65.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-65.c > > new file mode 100644 > > index 00000000000..ec1707be9f5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-65.c > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O3" } */ > > +/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */ > > + > > +int *a; > > +int b, c, d, e; > > +void f() { > > + int g; > > + for (;;) > > + for (; b;) > > + if (d) > > + for (; c;) > > + if (g) > > + e += a[1] = a[2] = e; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-66.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-66.c > > new file mode 100644 > > index 00000000000..b59a2cc7240 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-66.c > > @@ -0,0 +1,32 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +typedef struct { > > + double a, b; > > +} c; > > +typedef struct { > > + c d; > > + long coordinates; > > +} e; > > +int f; > > +c g; > > +e h; > > +void k(int); > > +int n(); > > +void j() { int i; k(i); } > > +void k(int l) { > > + double a; > > + int b; > > + c m[4]; > > + long i; > > + for (; l;) > > + do { > > + g.a = b ?: a; > > + m[3] = g; > > + if (f) > > + m[0] = m[1] = m[3]; > > + i = 0; > > + for (; i < 4; i++) > > + (&h + i)->d = m[i]; > > + } while (n()); > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c > > b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c > > new file mode 100644 > > index 00000000000..62b18bd5764 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c > > @@ -0,0 +1,31 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +int a[1024]; > > + > > +void foo (void) > > +{ > > + for (int i = 0; i < 1020; i += 4) > > + { > > + int suma = a[i]; > > + int sumb = a[i+1]; > > + int sumc = a[i+2]; > > + int sumd = a[i+3]; > > + for (unsigned j = 0; j < 77; ++j) > > + { > > + suma = (suma ^ i) + 1; > > + sumb = (sumb ^ i) + 2; > > + sumc = (sumc ^ i) + 3; > > + sumd = (sumd ^ i) + 4; > > + } > > + a[i] = suma; > > + a[i+1] = sumb; > > + a[i+2] = sumc; > > + a[i+3] = sumd; > > + } > > +} > > + > > +/* We should vectorize this outer loop with SLP. */ > > +/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "vect" } } */ > > diff --git a/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f > > b/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f > > new file mode 100644 > > index 00000000000..74b3b17b5a9 > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f > > @@ -0,0 +1,28 @@ > > +! { dg-do compile } > > + subroutine tranx3 (jbeg,jend,kbeg,kend,dlo,den,mflx,zro) > > + parameter(in = 128+5 > > + & , jn = 128+5 > > + & , kn = 128+5) > > + parameter(ijkn = 128+5) > > + real*8 zro, dqm, dqp, dx3bi (kn) > > + real*8 mflux (ijkn,4), dtwid (ijkn,4), dd (ijkn,4) > > + real*8 mflx (in,jn,kn) > > + real*8 dlo (in,jn,kn), den (in,jn,kn) > > + do 2100 j=jbeg-1,jend > > + dtwid (k,1) = ( 0.5 + q1 ) * ( dlo(i ,j,k-1) > > + 3 - ( dx3a(k ) + xi ) * dd (k ,1) ) > > + mflux (k,1) = dtwid (k,1) * ( v3(i ,j,k) - vg3(k) ) * dt > > + if (j.ge.jbeg) then > > + den(i ,j,k) = ( dlo(i ,j,k) * dvl3a(k) > > + 1 - etwid (k+1,1) + etwid (k,1) ) * dvl3a i(k) > > + if (kend .eq. ke) mflx(i ,j,ke+1) = mflux (ke+1,1) > > + endif > > + do 2030 k=max(kbeg-2,ks-1),kend+1 > > + dqm = (dlo(i ,j,k ) - dlo(i ,j,k-1)) * dx3bi(k ) > > + dqp = (dlo(i ,j,k+1) - dlo(i ,j,k )) * dx3bi(k+1) > > + dd(k,1) = max ( dqm * dqp, zro ) > > +2030 continue > > + dtwid (k,3) = ( 0.5 + q1 ) * ( dlo(i+2,j,k-1) > > + 3 - ( dx3a(k ) + xi ) * deod (k ,3) ) > > +2100 continue > > + end > > diff --git a/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f > > b/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f > > new file mode 100644 > > index 00000000000..34c44def093 > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f > > @@ -0,0 +1,40 @@ > > +! { dg-do compile } > > +! { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } > > + subroutine tranx3 (ibeg,jbeg,jend,kbeg,kend > > + & ,dlo,den > > + & ,edn) > > + parameter(in = 128+5 > > + & , jn = 128+5 > > + & , kn = 128+5) > > + parameter(ijkn = 128+5) > > + real*8 e (in,jn,kn), dqm, dvl3a (kn), dvl3ai (kn) > > + & , dtwid (ijkn,4), dd (ijkn,4) > > + & , etwid (ijkn,4), deod (ijkn,4) > > + real*8 dlo (in,jn,kn), den (in,jn,kn) > > + & , edn (in,jn,kn) > > + do 2100 j=jbeg-1,jend > > + i = ibeg - 1 > > + do 1080 k=kbeg,kend > > + den(i ,j,k) = ( dlo(i ,j,k) * dvl3a(k) > > + 1 - etwid (k+1,1) + etwid (k,1) ) * dvl3a i(k) > > +1080 continue > > + do 2030 k=max(kbeg-2,ks-1),kend+1 > > + dqm = (dlo(i+2,j,k ) - dlo(i+2,j,k-1)) * dx3bi(k ) > > + dd(k,4) = max ( dqm * dqp, zro ) > > +2030 continue > > + dtwid (k,3) = ( 0.5 + q1 ) * ( dlo(i+2,j,k-1) > > + 1 + ( dx3a(k-1) - xi ) * dd (k-1,3) ) > > + 2 + ( 0.5 - q1 ) * ( dlo(i+2,j,k ) > > + 3 - ( dx3a(k ) + xi ) * deod (k ,3) ) > > + do 2080 k=kbeg,kend > > + den(i ,j,k) = ( dlo(i ,j,k) * dvl3a(k) > > + 1 - dtwid (k+1,3) + dtwid (k,3) ) * dvl3a i(k) > > + e (i+2,j,k) = ( e (i+2,j,k) * dvl3a(k) > > + 1 - etwid (k+1,3) + etwid (k,3) ) * dvl3a i(k) > > + edn(i+2,j,k) = e(i+2,j,k) / den(i+2,j,k) > > + e (i+3,j,k) = ( e (i+3,j,k) * dvl3a(k) > > + 1 - etwid (k+1,4) + etwid (k,4) ) * dvl3a i(k) > > + edn(i+3,j,k) = e(i+3,j,k) / den(i+3,j,k) > > +2080 continue > > +2100 continue > > + end > > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > > index 5d00b6fb956..36179188f6d 100644 > > --- a/gcc/tree-vect-loop-manip.c > > +++ b/gcc/tree-vect-loop-manip.c > > @@ -1084,6 +1084,33 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop > > *loop, > > exit = single_exit (loop); > > basic_block new_preheader = new_bbs[0]; > > > > + /* Before installing PHI arguments make sure that the edges > > + into them match that of the scalar loop we analyzed. This > > + makes sure the SLP tree matches up between the main vectorized > > + loop and the epilogue vectorized copies. */ > > + if (single_succ_edge (preheader)->dest_idx > > + != single_succ_edge (new_bbs[0])->dest_idx) > > + { > > + basic_block swap_bb = new_bbs[1]; > > + gcc_assert (EDGE_COUNT (swap_bb->preds) == 2); > > + std::swap (EDGE_PRED (swap_bb, 0), EDGE_PRED (swap_bb, 1)); > > + EDGE_PRED (swap_bb, 0)->dest_idx = 0; > > + EDGE_PRED (swap_bb, 1)->dest_idx = 1; > > + } > > + if (duplicate_outer_loop) > > + { > > + class loop *new_inner_loop = get_loop_copy (scalar_loop->inner); > > + if (loop_preheader_edge (scalar_loop)->dest_idx > > + != loop_preheader_edge (new_inner_loop)->dest_idx) > > + { > > + basic_block swap_bb = new_inner_loop->header; > > + gcc_assert (EDGE_COUNT (swap_bb->preds) == 2); > > + std::swap (EDGE_PRED (swap_bb, 0), EDGE_PRED (swap_bb, 1)); > > + EDGE_PRED (swap_bb, 0)->dest_idx = 0; > > + EDGE_PRED (swap_bb, 1)->dest_idx = 1; > > + } > > + } > > + > > add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL); > > > > /* Skip new preheader since it's deleted if copy loop is added at entry. > > */ > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > index e42f3277ed5..75b731407ba 100644 > > --- a/gcc/tree-vect-loop.c > > +++ b/gcc/tree-vect-loop.c > > @@ -7377,15 +7377,24 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > if (slp_node) > > { > > vec_initial_defs.reserve (vec_num); > > - gcc_assert (slp_node == slp_node_instance->reduc_phis); > > - stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); > > - tree neutral_op > > - = neutral_op_for_slp_reduction (slp_node, vectype_out, > > - STMT_VINFO_REDUC_CODE (reduc_info), > > - first != NULL); > > - get_initial_defs_for_reduction (loop_vinfo, > > slp_node_instance->reduc_phis, > > - &vec_initial_defs, vec_num, > > - first != NULL, neutral_op); > > + if (nested_cycle) > > + { > > + unsigned phi_idx = loop_preheader_edge (loop)->dest_idx; > > + vect_get_slp_defs (SLP_TREE_CHILDREN (slp_node)[phi_idx], > > + &vec_initial_defs); > > + } > > + else > > + { > > + gcc_assert (slp_node == slp_node_instance->reduc_phis); > > + stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); > > + tree neutral_op > > + = neutral_op_for_slp_reduction (slp_node, vectype_out, > > + STMT_VINFO_REDUC_CODE > > (reduc_info), > > + first != NULL); > > + get_initial_defs_for_reduction (loop_vinfo, > > slp_node_instance->reduc_phis, > > + &vec_initial_defs, vec_num, > > + first != NULL, neutral_op); > > + } > > } > > else > > { > > @@ -7520,6 +7529,79 @@ vectorizable_lc_phi (loop_vec_info loop_vinfo, > > return true; > > } > > > > +/* Vectorizes PHIs. */ > > + > > +bool > > +vectorizable_phi (vec_info *, > > + stmt_vec_info stmt_info, gimple **vec_stmt, > > + slp_tree slp_node) > > +{ > > + if (!is_a <gphi *> (stmt_info->stmt) || !slp_node) > > + return false; > > + > > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) > > + return false; > > + > > + tree vectype = SLP_TREE_VECTYPE (slp_node); > > + > > + if (!vec_stmt) /* transformation not required. */ > > + { > > + slp_tree child; > > + unsigned i; > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child) > > + if (!child) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "PHI node with unvectorized backedge def\n"); > > + return false; > > + } > > + else if (!vect_maybe_update_slp_op_vectype (child, vectype)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "incompatible vector types for > > invariants\n"); > > + return false; > > + } > > + STMT_VINFO_TYPE (stmt_info) = phi_info_type; > > + return true; > > + } > > + > > + tree scalar_dest = gimple_phi_result (stmt_info->stmt); > > + basic_block bb = gimple_bb (stmt_info->stmt); > > + tree vec_dest = vect_create_destination_var (scalar_dest, vectype); > > + auto_vec<tree> vec_oprnds; > > + auto_vec<gphi *> new_phis; > > + for (unsigned i = 0; i < gimple_phi_num_args (stmt_info->stmt); ++i) > > + { > > + slp_tree child = SLP_TREE_CHILDREN (slp_node)[i]; > > + > > + /* Skip not yet vectorized defs. */ > > + if (SLP_TREE_DEF_TYPE (child) == vect_internal_def > > + && SLP_TREE_VEC_STMTS (child).is_empty ()) > > + continue; > > + > > + vect_get_slp_defs (SLP_TREE_CHILDREN (slp_node)[i], &vec_oprnds); > > + if (!new_phis.exists ()) > > + { > > + new_phis.create (vec_oprnds.length ()); > > + for (unsigned j = 0; j < vec_oprnds.length (); j++) > > + { > > + /* Create the vectorized LC PHI node. */ > > + new_phis.quick_push (create_phi_node (vec_dest, bb)); > > + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_phis[j]); > > + } > > + } > > + edge e = gimple_phi_arg_edge (as_a <gphi *> (stmt_info->stmt), i); > > + for (unsigned j = 0; j < vec_oprnds.length (); j++) > > + add_phi_arg (new_phis[j], vec_oprnds[j], e, UNKNOWN_LOCATION); > > + } > > + /* We should have at least one already vectorized child. */ > > + gcc_assert (new_phis.exists ()); > > + > > + return true; > > +} > > + > > > > /* Function vect_min_worthwhile_factor. > > > > @@ -8376,8 +8458,16 @@ vectorizable_live_operation (vec_info *vinfo, > > gimple_seq stmts = NULL; > > new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree), > > &stmts, true, NULL_TREE); > > - > > - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > > + if (is_a <gphi *> (vec_stmt)) > > + { > > + gimple_stmt_iterator si = gsi_after_labels (gimple_bb (vec_stmt)); > > + gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT); > > + } > > + else > > + { > > + gimple_stmt_iterator si = gsi_for_stmt (vec_stmt); > > + gsi_insert_seq_after (&si, stmts, GSI_SAME_STMT); > > + } > > > > /* Replace use of lhs with newly computed result. If the use stmt > > is a > > single arg PHI, just replace all uses of PHI result. It's > > necessary > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > index 85865dae1ab..f544b552a46 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -115,7 +115,8 @@ vect_free_slp_tree (slp_tree node) > > return; > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_free_slp_tree (child); > > + if (child) > > + vect_free_slp_tree (child); > > > > delete node; > > } > > @@ -148,9 +149,9 @@ vect_free_slp_instance (slp_instance instance) > > /* Create an SLP node for SCALAR_STMTS. */ > > > > static slp_tree > > -vect_create_new_slp_node (vec<stmt_vec_info> scalar_stmts, unsigned nops) > > +vect_create_new_slp_node (slp_tree node, > > + vec<stmt_vec_info> scalar_stmts, unsigned nops) > > { > > - slp_tree node = new _slp_tree; > > SLP_TREE_SCALAR_STMTS (node) = scalar_stmts; > > SLP_TREE_CHILDREN (node).create (nops); > > SLP_TREE_DEF_TYPE (node) = vect_internal_def; > > @@ -159,18 +160,33 @@ vect_create_new_slp_node (vec<stmt_vec_info> > > scalar_stmts, unsigned nops) > > return node; > > } > > > > +/* Create an SLP node for SCALAR_STMTS. */ > > + > > +static slp_tree > > +vect_create_new_slp_node (vec<stmt_vec_info> scalar_stmts, unsigned nops) > > +{ > > + return vect_create_new_slp_node (new _slp_tree, scalar_stmts, nops); > > +} > > + > > /* Create an SLP node for OPS. */ > > > > static slp_tree > > -vect_create_new_slp_node (vec<tree> ops) > > +vect_create_new_slp_node (slp_tree node, vec<tree> ops) > > { > > - slp_tree node = new _slp_tree; > > SLP_TREE_SCALAR_OPS (node) = ops; > > SLP_TREE_DEF_TYPE (node) = vect_external_def; > > SLP_TREE_LANES (node) = ops.length (); > > return node; > > } > > > > +/* Create an SLP node for OPS. */ > > + > > +static slp_tree > > +vect_create_new_slp_node (vec<tree> ops) > > +{ > > + return vect_create_new_slp_node (new _slp_tree, ops); > > +} > > + > > > > /* This structure is used in creation of an SLP tree. Each instance > > corresponds to the same operand in a group of scalar stmts in an SLP > > @@ -443,10 +459,13 @@ vect_get_and_check_slp_defs (vec_info *vinfo, > > unsigned char swap, > > else > > commutative_op = commutative_tree_code (code) ? 0U : -1U; > > } > > + else if (gphi *stmt = dyn_cast <gphi *> (stmt_info->stmt)) > > + number_of_oprnds = gimple_phi_num_args (stmt); > > else > > return -1; > > > > bool swapped = (swap != 0); > > + bool backedge = false; > > gcc_assert (!swapped || first_op_cond); > > enum vect_def_type *dts = XALLOCAVEC (enum vect_def_type, > > number_of_oprnds); > > for (i = 0; i < number_of_oprnds; i++) > > @@ -463,6 +482,13 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned > > char swap, > > else > > oprnd = gimple_op (stmt_info->stmt, map[i]); > > } > > + else if (gphi *stmt = dyn_cast <gphi *> (stmt_info->stmt)) > > + { > > + oprnd = gimple_phi_arg_def (stmt, i); > > + backedge = dominated_by_p (CDI_DOMINATORS, > > + gimple_phi_arg_edge (stmt, i)->src, > > + gimple_bb (stmt_info->stmt)); > > + } > > else > > oprnd = gimple_op (stmt_info->stmt, first_op_idx + (swapped ? !i : > > i)); > > if (TREE_CODE (oprnd) == VIEW_CONVERT_EXPR) > > @@ -487,6 +513,26 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned > > char swap, > > oprnd_info->def_stmts.quick_push (def_stmt_info); > > oprnd_info->ops.quick_push (oprnd); > > > > + /* If there's a extern def on a backedge make sure we can > > + code-generate at the region start. > > + ??? This is another case that could be fixed by adjusting > > + how we split the function but at the moment we'd have conflicting > > + goals there. */ > > + if (backedge > > + && dts[i] == vect_external_def > > + && is_a <bb_vec_info> (vinfo) > > + && !SSA_NAME_IS_DEFAULT_DEF (oprnd) > > + && !dominated_by_p (CDI_DOMINATORS, > > + as_a <bb_vec_info> (vinfo)->bbs[0], > > + gimple_bb (SSA_NAME_DEF_STMT (oprnd)))) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "Build SLP failed: extern def %T only defined " > > + "on backedge\n", oprnd); > > + return -1; > > + } > > + > > if (first) > > { > > tree type = TREE_TYPE (oprnd); > > @@ -521,6 +567,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned > > char swap, > > case vect_internal_def: > > case vect_reduction_def: > > case vect_induction_def: > > + case vect_nested_cycle: > > break; > > > > default: > > @@ -815,6 +862,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > machine_mode vec_mode; > > stmt_vec_info first_load = NULL, prev_first_load = NULL; > > bool first_stmt_load_p = false, load_p = false; > > + bool first_stmt_phi_p = false, phi_p = false; > > > > /* For every stmt in NODE find its def stmt/s. */ > > stmt_vec_info stmt_info; > > @@ -904,6 +952,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > return false; > > } > > } > > + else if (gimple_code (stmt) == GIMPLE_PHI) > > + { > > + rhs_code = ERROR_MARK; > > + phi_p = true; > > + } > > else > > { > > rhs_code = gimple_assign_rhs_code (stmt); > > @@ -916,6 +969,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > *node_vectype = vectype; > > first_stmt_code = rhs_code; > > first_stmt_load_p = load_p; > > + first_stmt_phi_p = phi_p; > > > > /* Shift arguments should be equal in all the packed stmts for a > > vector shift with scalar shift operand. */ > > @@ -1021,7 +1075,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > || first_stmt_code == INDIRECT_REF > > || first_stmt_code == COMPONENT_REF > > || first_stmt_code == MEM_REF))) > > - || first_stmt_load_p != load_p) > > + || first_stmt_load_p != load_p > > + || first_stmt_phi_p != phi_p) > > { > > if (dump_enabled_p ()) > > { > > @@ -1077,6 +1132,18 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned > > char *swap, > > } > > } > > > > + if (phi_p > > + && (gimple_bb (first_stmt_info->stmt) > > + != gimple_bb (stmt_info->stmt))) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "Build SLP failed: different BB for PHI " > > + "in %G", stmt); > > + /* Mismatch. */ > > + continue; > > + } > > + > > if (!types_compatible_p (vectype, *node_vectype)) > > { > > if (dump_enabled_p ()) > > @@ -1138,7 +1205,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char > > *swap, > > } > > > > /* Not memory operation. */ > > - if (TREE_CODE_CLASS (rhs_code) != tcc_binary > > + if (!phi_p > > + && TREE_CODE_CLASS (rhs_code) != tcc_binary > > && TREE_CODE_CLASS (rhs_code) != tcc_unary > > && TREE_CODE_CLASS (rhs_code) != tcc_expression > > && TREE_CODE_CLASS (rhs_code) != tcc_comparison > > @@ -1251,7 +1319,7 @@ typedef hash_map <vec <gimple *>, slp_tree, > > scalar_stmts_to_slp_tree_map_t; > > > > static slp_tree > > -vect_build_slp_tree_2 (vec_info *vinfo, > > +vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, > > vec<stmt_vec_info> stmts, unsigned int group_size, > > poly_uint64 *max_nunits, > > bool *matches, unsigned *npermutes, unsigned > > *tree_size, > > @@ -1276,19 +1344,37 @@ vect_build_slp_tree (vec_info *vinfo, > > } > > return *leader; > > } > > + > > + /* Seed the bst_map with a stub node to be filled by > > vect_build_slp_tree_2 > > + so we can pick up backedge destinations during discovery. */ > > + slp_tree res = new _slp_tree; > > + SLP_TREE_DEF_TYPE (res) = vect_internal_def; > > + SLP_TREE_SCALAR_STMTS (res) = stmts; > > + bst_map->put (stmts.copy (), res); > > + > > poly_uint64 this_max_nunits = 1; > > - slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, > > + slp_tree res_ = vect_build_slp_tree_2 (vinfo, res, stmts, group_size, > > &this_max_nunits, > > matches, npermutes, tree_size, > > bst_map); > > - if (res) > > + if (!res_) > > + { > > + bool existed_p = bst_map->put (stmts, NULL); > > + gcc_assert (existed_p); > > + /* Mark the node invalid so we can detect those when still in use > > + as backedge destinations. */ > > + SLP_TREE_SCALAR_STMTS (res) = vNULL; > > + SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def; > > + vect_free_slp_tree (res); > > + } > > + else > > { > > + gcc_assert (res_ == res); > > res->max_nunits = this_max_nunits; > > vect_update_max_nunits (max_nunits, this_max_nunits); > > /* Keep a reference for the bst_map use. */ > > SLP_TREE_REF_COUNT (res)++; > > } > > - bst_map->put (stmts.copy (), res); > > - return res; > > + return res_; > > } > > > > /* Recursively build an SLP tree starting from NODE. > > @@ -1299,7 +1385,7 @@ vect_build_slp_tree (vec_info *vinfo, > > was found. */ > > > > static slp_tree > > -vect_build_slp_tree_2 (vec_info *vinfo, > > +vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, > > vec<stmt_vec_info> stmts, unsigned int group_size, > > poly_uint64 *max_nunits, > > bool *matches, unsigned *npermutes, unsigned > > *tree_size, > > @@ -1307,7 +1393,6 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > { > > unsigned nops, i, this_tree_size = 0; > > poly_uint64 this_max_nunits = *max_nunits; > > - slp_tree node; > > > > matches[0] = false; > > > > @@ -1327,41 +1412,60 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > > > /* If the SLP node is a PHI (induction or reduction), terminate > > the recursion. */ > > - if (gphi *stmt = dyn_cast <gphi *> (stmt_info->stmt)) > > - { > > - tree scalar_type = TREE_TYPE (PHI_RESULT (stmt)); > > - tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, > > - group_size); > > - if (!vect_record_max_nunits (vinfo, stmt_info, group_size, vectype, > > - max_nunits)) > > - return NULL; > > + bool skip_args[2] = { false, false }; > > + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > + if (gphi *stmt = dyn_cast <gphi *> (stmt_info->stmt)) > > + { > > + tree scalar_type = TREE_TYPE (PHI_RESULT (stmt)); > > + tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, > > + group_size); > > + if (!vect_record_max_nunits (vinfo, stmt_info, group_size, vectype, > > + max_nunits)) > > + return NULL; > > > > - vect_def_type def_type = STMT_VINFO_DEF_TYPE (stmt_info); > > - /* Induction from different IVs is not supported. */ > > - if (def_type == vect_induction_def) > > - { > > - stmt_vec_info other_info; > > - FOR_EACH_VEC_ELT (stmts, i, other_info) > > - if (stmt_info != other_info) > > - return NULL; > > - } > > - else if (def_type == vect_reduction_def > > - || def_type == vect_double_reduction_def > > - || def_type == vect_nested_cycle) > > - { > > - /* Else def types have to match. */ > > - stmt_vec_info other_info; > > - FOR_EACH_VEC_ELT (stmts, i, other_info) > > - if (STMT_VINFO_DEF_TYPE (other_info) != def_type) > > - return NULL; > > - } > > - else > > - return NULL; > > - (*tree_size)++; > > - node = vect_create_new_slp_node (stmts, nops); > > - SLP_TREE_VECTYPE (node) = vectype; > > - return node; > > - } > > + vect_def_type def_type = STMT_VINFO_DEF_TYPE (stmt_info); > > + /* Induction from different IVs is not supported. */ > > + if (def_type == vect_induction_def) > > + { > > + stmt_vec_info other_info; > > + FOR_EACH_VEC_ELT (stmts, i, other_info) > > + if (stmt_info != other_info) > > + return NULL; > > + > > + /* Induction PHIs are leafs. */ > > + (*tree_size)++; > > + node = vect_create_new_slp_node (node, stmts, nops); > > + SLP_TREE_VECTYPE (node) = vectype; > > + SLP_TREE_CHILDREN (node).quick_grow_cleared (nops); > > + return node; > > + } > > + else if (def_type == vect_reduction_def > > + || def_type == vect_double_reduction_def > > + || def_type == vect_nested_cycle) > > + { > > + /* Else def types have to match. */ > > + stmt_vec_info other_info; > > + bool all_same = true; > > + FOR_EACH_VEC_ELT (stmts, i, other_info) > > + { > > + if (STMT_VINFO_DEF_TYPE (other_info) != def_type) > > + return NULL; > > + if (other_info != stmt_info) > > + all_same = false; > > + } > > + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > + /* Reduction initial values are not explicitely represented. */ > > + if (!nested_in_vect_loop_p (loop, stmt_info)) > > + skip_args[loop_preheader_edge (loop)->dest_idx] = true; > > + /* Reduction chain backedge defs are filled manually. > > + ??? Need a better way to identify a SLP reduction chain PHI. > > + Or a better overall way to SLP match those. */ > > + if (all_same && def_type == vect_reduction_def) > > + skip_args[loop_latch_edge (loop)->dest_idx] = true; > > + } > > + else if (def_type != vect_internal_def) > > + return NULL; > > + } > > > > > > bool two_operators = false; > > @@ -1386,7 +1490,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > { > > *max_nunits = this_max_nunits; > > (*tree_size)++; > > - node = vect_create_new_slp_node (stmts, 0); > > + node = vect_create_new_slp_node (node, stmts, 0); > > SLP_TREE_VECTYPE (node) = vectype; > > /* And compute the load permutation. Whether it is actually > > a permutation depends on the unrolling factor which is > > @@ -1440,7 +1544,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > representing an actual vector without any scalar ops. > > ??? We could hide it completely with making the permute node > > external? */ > > - node = vect_create_new_slp_node (stmts, 1); > > + node = vect_create_new_slp_node (node, stmts, 1); > > SLP_TREE_CODE (node) = VEC_PERM_EXPR; > > SLP_TREE_LANE_PERMUTATION (node) = lperm; > > SLP_TREE_VECTYPE (node) = vectype; > > @@ -1486,6 +1590,14 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > continue; > > } > > > > + /* We're skipping certain operands from processing, for example > > + outer loop reduction initial defs. */ > > + if (i <= 1 && skip_args[i]) > > + { > > + children.safe_push (NULL); > > + continue; > > + } > > + > > if (is_a <bb_vec_info> (vinfo) > > && oprnd_info->first_dt == vect_internal_def) > > { > > @@ -1508,9 +1620,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, > > } > > } > > > > - if (oprnd_info->first_dt != vect_internal_def > > - && oprnd_info->first_dt != vect_reduction_def > > - && oprnd_info->first_dt != vect_induction_def) > > + if (oprnd_info->first_dt == vect_external_def > > + || oprnd_info->first_dt == vect_constant_def) > > { > > slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops); > > SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt; > > @@ -1639,14 +1750,14 @@ fail: > > child = vect_create_new_slp_node (oprnd_info->ops); > > children.safe_push (child); > > oprnd_info->ops = vNULL; > > - oprnd_info->def_stmts = vNULL; > > continue; > > } > > } > > > > gcc_assert (child == NULL); > > FOR_EACH_VEC_ELT (children, j, child) > > - vect_free_slp_tree (child); > > + if (child) > > + vect_free_slp_tree (child); > > vect_free_oprnd_info (oprnds_info); > > return NULL; > > } > > @@ -1670,7 +1781,9 @@ fail: > > unsigned n_vector_builds = 0; > > FOR_EACH_VEC_ELT (children, j, child) > > { > > - if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > + if (!child) > > + ; > > + else if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > all_uniform_p = false; > > else if (!vect_slp_tree_uniform_p (child)) > > { > > @@ -1684,7 +1797,8 @@ fail: > > /* Roll back. */ > > matches[0] = false; > > FOR_EACH_VEC_ELT (children, j, child) > > - vect_free_slp_tree (child); > > + if (child) > > + vect_free_slp_tree (child); > > > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, > > @@ -1718,7 +1832,7 @@ fail: > > > > /* Here we record the original defs since this > > node represents the final lane configuration. */ > > - node = vect_create_new_slp_node (stmts, 2); > > + node = vect_create_new_slp_node (node, stmts, 2); > > SLP_TREE_VECTYPE (node) = vectype; > > SLP_TREE_CODE (node) = VEC_PERM_EXPR; > > SLP_TREE_CHILDREN (node).quick_push (one); > > @@ -1749,7 +1863,7 @@ fail: > > return node; > > } > > > > - node = vect_create_new_slp_node (stmts, nops); > > + node = vect_create_new_slp_node (node, stmts, nops); > > SLP_TREE_VECTYPE (node) = vectype; > > SLP_TREE_CHILDREN (node).splice (children); > > return node; > > @@ -1835,7 +1949,8 @@ vect_print_slp_graph (dump_flags_t dump_kind, > > dump_location_t loc, > > vect_print_slp_tree (dump_kind, loc, node); > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_print_slp_graph (dump_kind, loc, child, visited); > > + if (child) > > + vect_print_slp_graph (dump_kind, loc, child, visited); > > } > > > > static void > > @@ -1865,7 +1980,8 @@ vect_mark_slp_stmts (slp_tree node, > > hash_set<slp_tree> &visited) > > STMT_SLP_TYPE (stmt_info) = pure_slp; > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_mark_slp_stmts (child, visited); > > + if (child) > > + vect_mark_slp_stmts (child, visited); > > } > > > > static void > > @@ -1898,7 +2014,8 @@ vect_mark_slp_stmts_relevant (slp_tree node, > > hash_set<slp_tree> &visited) > > } > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_mark_slp_stmts_relevant (child, visited); > > + if (child) > > + vect_mark_slp_stmts_relevant (child, visited); > > } > > > > static void > > @@ -1915,7 +2032,7 @@ static void > > vect_gather_slp_loads (vec<slp_tree> &loads, slp_tree node, > > hash_set<slp_tree> &visited) > > { > > - if (visited.add (node)) > > + if (!node || visited.add (node)) > > return; > > > > if (SLP_TREE_CHILDREN (node).length () == 0) > > @@ -2176,34 +2293,62 @@ vect_build_slp_instance (vec_info *vinfo, > > } > > } > > > > - /* If this is a reduction chain with a conversion in front > > - amend the SLP tree with a node for that. */ > > - if (kind == slp_inst_kind_reduc_chain > > - && STMT_VINFO_DEF_TYPE (scalar_stmts[0]) != > > vect_reduction_def) > > + /* Fixup SLP reduction chains. */ > > + if (kind == slp_inst_kind_reduc_chain) > > { > > - /* Get at the conversion stmt - we know it's the single use > > - of the last stmt of the reduction chain. */ > > - gimple *tem = vect_orig_stmt (scalar_stmts[group_size - > > 1])->stmt; > > + /* If this is a reduction chain with a conversion in front > > + amend the SLP tree with a node for that. */ > > + gimple *scalar_def > > + = vect_orig_stmt (scalar_stmts[group_size - 1])->stmt; > > + if (STMT_VINFO_DEF_TYPE (scalar_stmts[0]) != > > vect_reduction_def) > > + { > > + /* Get at the conversion stmt - we know it's the single > > use > > + of the last stmt of the reduction chain. */ > > + use_operand_p use_p; > > + bool r = single_imm_use (gimple_assign_lhs (scalar_def), > > + &use_p, &scalar_def); > > + gcc_assert (r); > > + stmt_vec_info next_info = vinfo->lookup_stmt (scalar_def); > > + next_info = vect_stmt_to_vectorize (next_info); > > + scalar_stmts = vNULL; > > + scalar_stmts.create (group_size); > > + for (unsigned i = 0; i < group_size; ++i) > > + scalar_stmts.quick_push (next_info); > > + slp_tree conv = vect_create_new_slp_node (scalar_stmts, > > 1); > > + SLP_TREE_VECTYPE (conv) = STMT_VINFO_VECTYPE (next_info); > > + SLP_TREE_CHILDREN (conv).quick_push (node); > > + SLP_INSTANCE_TREE (new_instance) = conv; > > + /* We also have to fake this conversion stmt as SLP > > reduction > > + group so we don't have to mess with too much code > > + elsewhere. */ > > + REDUC_GROUP_FIRST_ELEMENT (next_info) = next_info; > > + REDUC_GROUP_NEXT_ELEMENT (next_info) = NULL; > > + } > > + /* Fill the backedge child of the PHI SLP node. The > > + general matching code cannot find it because the > > + scalar code does not reflect how we vectorize the > > + reduction. */ > > use_operand_p use_p; > > - gimple *use_stmt; > > - bool r = single_imm_use (gimple_assign_lhs (tem), > > - &use_p, &use_stmt); > > - gcc_assert (r); > > - stmt_vec_info next_info = vinfo->lookup_stmt (use_stmt); > > - next_info = vect_stmt_to_vectorize (next_info); > > - scalar_stmts = vNULL; > > - scalar_stmts.create (group_size); > > - for (unsigned i = 0; i < group_size; ++i) > > - scalar_stmts.quick_push (next_info); > > - slp_tree conv = vect_create_new_slp_node (scalar_stmts, 1); > > - SLP_TREE_VECTYPE (conv) = STMT_VINFO_VECTYPE (next_info); > > - SLP_TREE_CHILDREN (conv).quick_push (node); > > - SLP_INSTANCE_TREE (new_instance) = conv; > > - /* We also have to fake this conversion stmt as SLP reduction > > - group so we don't have to mess with too much code > > - elsewhere. */ > > - REDUC_GROUP_FIRST_ELEMENT (next_info) = next_info; > > - REDUC_GROUP_NEXT_ELEMENT (next_info) = NULL; > > + imm_use_iterator imm_iter; > > + class loop *loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> > > (vinfo)); > > + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, > > + gimple_get_lhs (scalar_def)) > > + /* There are exactly two non-debug uses, the reduction > > + PHI and the loop-closed PHI node. */ > > + if (!is_gimple_debug (USE_STMT (use_p)) > > + && gimple_bb (USE_STMT (use_p)) == loop->header) > > + { > > + auto_vec<stmt_vec_info, 64> phis (group_size); > > + stmt_vec_info phi_info > > + = vinfo->lookup_stmt (USE_STMT (use_p)); > > + for (unsigned i = 0; i < group_size; ++i) > > + phis.quick_push (phi_info); > > + slp_tree *phi_node = bst_map->get (phis); > > + unsigned dest_idx = loop_latch_edge (loop)->dest_idx; > > + SLP_TREE_CHILDREN (*phi_node)[dest_idx] > > + = SLP_INSTANCE_TREE (new_instance); > > + SLP_INSTANCE_TREE (new_instance)->refcnt++; > > + } > > } > > > > vinfo->slp_instances.safe_push (new_instance); > > @@ -2437,60 +2582,6 @@ vect_analyze_slp_instance (vec_info *vinfo, > > return res; > > } > > > > -/* Fill in backedge SLP children in the SLP graph. */ > > - > > -static void > > -vect_analyze_slp_backedges (vec_info *vinfo, slp_tree node, > > - scalar_stmts_to_slp_tree_map_t *bst_map, > > - hash_set<slp_tree> &visited) > > -{ > > - if (SLP_TREE_DEF_TYPE (node) != vect_internal_def > > - || visited.add (node)) > > - return; > > - > > - slp_tree child; > > - unsigned i; > > - FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - if (child) > > - vect_analyze_slp_backedges (vinfo, child, bst_map, visited); > > - > > - /* Inductions are not vectorized by vectorizing their defining cycle > > - but by materializing the values from SCEV data. */ > > - if (STMT_VINFO_DEF_TYPE (SLP_TREE_REPRESENTATIVE (node)) > > - == vect_induction_def) > > - return; > > - > > - if (gphi *phi = dyn_cast <gphi *> (SLP_TREE_REPRESENTATIVE (node)->stmt)) > > - for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) > > - { > > - auto_vec<stmt_vec_info, 64> stmts; > > - unsigned j; > > - stmt_vec_info phi_info; > > - FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, phi_info) > > - { > > - tree def = gimple_phi_arg_def (as_a <gphi *>(phi_info->stmt), > > i); > > - stmt_vec_info def_info = vinfo->lookup_def (def); > > - if (!def_info) > > - break; > > - stmts.safe_push (vect_stmt_to_vectorize (def_info)); > > - } > > - if (j != SLP_TREE_LANES (node)) > > - continue; > > - slp_tree *edge_def = bst_map->get (stmts); > > - if (edge_def) > > - { > > - /* ??? We're currently not recording non-backedge children > > - of PHIs like external reduction initial values. Avoid > > - NULL entries in SLP_TREE_CHILDREN for those and thus > > - for now simply only record backedge defs at a > > - SLP_TREE_CHILDREN index possibly not matching that of > > - the corresponding PHI argument index. */ > > - SLP_TREE_CHILDREN (node).quick_push (*edge_def); > > - (*edge_def)->refcnt++; > > - } > > - } > > -} > > - > > /* Check if there are stmts in the loop can be vectorized using SLP. > > Build SLP > > trees of packed scalar stmts if SLP is possible. */ > > > > @@ -2541,13 +2632,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned > > max_tree_size) > > max_tree_size); > > } > > > > - /* Fill in backedges. */ > > - slp_instance instance; > > - hash_set<slp_tree> visited; > > - FOR_EACH_VEC_ELT (vinfo->slp_instances, i, instance) > > - vect_analyze_slp_backedges (vinfo, SLP_INSTANCE_TREE (instance), > > - bst_map, visited); > > - > > /* The map keeps a reference on SLP nodes built, release that. */ > > for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map->begin (); > > it != bst_map->end (); ++it) > > @@ -2572,11 +2656,16 @@ vect_slp_build_vertices (hash_set<slp_tree> > > &visited, slp_tree node, > > > > node->vertex = vertices.length (); > > vertices.safe_push (node); > > - if (SLP_TREE_CHILDREN (node).is_empty ()) > > + > > + bool leaf = true; > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > + if (child) > > + { > > + leaf = false; > > + vect_slp_build_vertices (visited, child, vertices, leafs); > > + } > > + if (leaf) > > leafs.safe_push (node->vertex); > > - else > > - FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_slp_build_vertices (visited, child, vertices, leafs); > > } > > > > /* Fill the vertices and leafs vector with all nodes in the SLP graph. */ > > @@ -2654,7 +2743,8 @@ vect_optimize_slp (vec_info *vinfo) > > unsigned j; > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child) > > - add_edge (slpg, i, child->vertex); > > + if (child) > > + add_edge (slpg, i, child->vertex); > > } > > > > /* Compute (reverse) postorder on the inverted graph. */ > > @@ -2853,7 +2943,7 @@ vect_optimize_slp (vec_info *vinfo) > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child) > > { > > - if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > + if (!child || SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > continue; > > > > /* If the vector is uniform there's nothing to do. */ > > @@ -3291,6 +3381,7 @@ vect_slp_convert_to_external (vec_info *vinfo, > > slp_tree node, > > unsigned int group_size = SLP_TREE_LANES (node); > > SLP_TREE_DEF_TYPE (node) = vect_external_def; > > SLP_TREE_SCALAR_OPS (node).safe_grow (group_size, true); > > + SLP_TREE_LOAD_PERMUTATION (node).release (); > > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) > > { > > tree lhs = gimple_get_lhs (vect_orig_stmt (stmt_info)->stmt); > > @@ -3373,9 +3464,20 @@ vect_slp_analyze_node_operations (vec_info *vinfo, > > slp_tree node, > > slp_tree child; > > > > /* Assume we can code-generate all invariants. */ > > - if (SLP_TREE_DEF_TYPE (node) != vect_internal_def) > > + if (!node > > + || SLP_TREE_DEF_TYPE (node) == vect_constant_def > > + || SLP_TREE_DEF_TYPE (node) == vect_external_def) > > return true; > > > > + if (SLP_TREE_DEF_TYPE (node) == vect_uninitialized_def) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "Failed cyclic SLP reference in %p", node); > > + return false; > > + } > > + gcc_assert (SLP_TREE_DEF_TYPE (node) == vect_internal_def); > > + > > /* If we already analyzed the exact same set of scalar stmts we're done. > > We share the generated vector stmts for those. */ > > if (visited.contains (node) > > @@ -3404,8 +3506,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, > > slp_tree node, > > other referrers. */ > > if (res) > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child) > > - if ((SLP_TREE_DEF_TYPE (child) == vect_constant_def > > - || SLP_TREE_DEF_TYPE (child) == vect_external_def) > > + if (child > > + && (SLP_TREE_DEF_TYPE (child) == vect_constant_def > > + || SLP_TREE_DEF_TYPE (child) == vect_external_def) > > /* Perform usual caching, note code-generation still > > code-gens these nodes multiple times but we expect > > to CSE them later. */ > > @@ -3459,8 +3562,12 @@ static void > > vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, slp_tree node, > > slp_instance instance, > > stmt_vector_for_cost *cost_vec, > > - hash_set<stmt_vec_info> &svisited) > > + hash_set<stmt_vec_info> &svisited, > > + hash_set<slp_tree> &visited) > > { > > + if (visited.add (node)) > > + return; > > + > > unsigned i; > > stmt_vec_info stmt_info; > > stmt_vec_info last_stmt = vect_find_last_scalar_stmt_in_slp (node); > > @@ -3479,7 +3586,7 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, > > slp_tree node, > > gimple *orig_stmt = orig_stmt_info->stmt; > > ssa_op_iter op_iter; > > def_operand_p def_p; > > - FOR_EACH_SSA_DEF_OPERAND (def_p, orig_stmt, op_iter, SSA_OP_DEF) > > + FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF) > > { > > imm_use_iterator use_iter; > > gimple *use_stmt; > > @@ -3548,9 +3655,9 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, > > slp_tree node, > > > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > + if (child && SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > vect_bb_slp_mark_live_stmts (bb_vinfo, child, instance, > > - cost_vec, svisited); > > + cost_vec, svisited, visited); > > } > > > > /* Analyze statements in SLP instances of VINFO. Return true if the > > @@ -3615,11 +3722,13 @@ vect_slp_analyze_operations (vec_info *vinfo) > > if (bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo)) > > { > > hash_set<stmt_vec_info> svisited; > > + hash_set<slp_tree> visited; > > for (i = 0; vinfo->slp_instances.iterate (i, &instance); ++i) > > { > > vect_location = instance->location (); > > vect_bb_slp_mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE > > (instance), > > - instance, &instance->cost_vec, > > svisited); > > + instance, &instance->cost_vec, > > svisited, > > + visited); > > } > > } > > > > @@ -3683,7 +3792,7 @@ vect_bb_partition_graph_r (bb_vec_info bb_vinfo, > > > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > + if (child && SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > vect_bb_partition_graph_r (bb_vinfo, instance, child, > > stmt_to_instance, > > instance_leader, visited); > > } > > @@ -3759,7 +3868,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo, > > the scalar cost. */ > > if (!STMT_VINFO_LIVE_P (stmt_info)) > > { > > - FOR_EACH_SSA_DEF_OPERAND (def_p, orig_stmt, op_iter, SSA_OP_DEF) > > + FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF) > > { > > imm_use_iterator use_iter; > > gimple *use_stmt; > > @@ -3803,7 +3912,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo, > > auto_vec<bool, 20> subtree_life; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > { > > - if (SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > + if (child && SLP_TREE_DEF_TYPE (child) == vect_internal_def) > > { > > /* Do not directly pass LIFE to the recursive call, copy it to > > confine changes in the callee to the current child/subtree. */ > > @@ -4272,19 +4381,18 @@ vect_slp_function (function *fun) > > for (unsigned i = 0; i < n; i++) > > { > > basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i]); > > - > > - /* Split when a basic block has multiple predecessors or when the > > - edge into it exits a loop (because of implementation issues with > > - respect to placement of CTORs for externals). */ > > bool split = false; > > - edge e; > > - if (!single_pred_p (bb) > > - || ((e = single_pred_edge (bb)), > > - loop_exit_edge_p (e->src->loop_father, e))) > > - split = true; > > + > > /* Split when a BB is not dominated by the first block. */ > > + if (!bbs.is_empty () > > + && !dominated_by_p (CDI_DOMINATORS, bb, bbs[0])) > > + split = true; > > + /* Split when the loop determined by the first block > > + is exited. This is because we eventually insert > > + invariants at region begin. */ > > else if (!bbs.is_empty () > > - && !dominated_by_p (CDI_DOMINATORS, bb, bbs[0])) > > + && bbs[0]->loop_father != bb->loop_father > > + && !flow_loop_nested_p (bbs[0]->loop_father, > > bb->loop_father)) > > split = true; > > > > if (split && !bbs.is_empty ()) > > @@ -5087,24 +5195,22 @@ vectorizable_slp_permutation (vec_info *vinfo, > > gimple_stmt_iterator *gsi, > > return true; > > } > > > > -/* Vectorize SLP instance tree in postorder. */ > > +/* Vectorize SLP NODE. */ > > > > static void > > -vect_schedule_slp_instance (vec_info *vinfo, > > - slp_tree node, slp_instance instance, > > - hash_set<slp_tree> &visited) > > +vect_schedule_slp_node (vec_info *vinfo, > > + slp_tree node, slp_instance instance) > > { > > gimple_stmt_iterator si; > > int i; > > slp_tree child; > > > > - /* See if we have already vectorized the node in the graph of the > > - SLP instance. */ > > - if ((SLP_TREE_DEF_TYPE (node) == vect_internal_def > > - && SLP_TREE_VEC_STMTS (node).exists ()) > > - || SLP_TREE_VEC_DEFS (node).exists ()) > > + /* For existing vectors there's nothing to do. */ > > + if (SLP_TREE_VEC_DEFS (node).exists ()) > > return; > > > > + gcc_assert (SLP_TREE_VEC_STMTS (node).is_empty ()); > > + > > /* Vectorize externals and constants. */ > > if (SLP_TREE_DEF_TYPE (node) == vect_constant_def > > || SLP_TREE_DEF_TYPE (node) == vect_external_def) > > @@ -5119,17 +5225,11 @@ vect_schedule_slp_instance (vec_info *vinfo, > > return; > > } > > > > - /* ??? If we'd have a way to mark backedges that would be cheaper. */ > > - if (visited.add (node)) > > - return; > > - > > - FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > - vect_schedule_slp_instance (vinfo, child, instance, visited); > > + stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); > > > > gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0); > > SLP_TREE_VEC_STMTS (node).create (SLP_TREE_NUMBER_OF_VEC_STMTS (node)); > > > > - stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, > > "------>vectorizing SLP node starting from: %G", > > @@ -5148,13 +5248,12 @@ vect_schedule_slp_instance (vec_info *vinfo, > > last_stmt_info = vect_find_last_scalar_stmt_in_slp (node); > > si = gsi_for_stmt (last_stmt_info->stmt); > > } > > - else if ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) > > - == cycle_phi_info_type) > > - || (STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) > > - == induc_vec_info_type)) > > + else if ((STMT_VINFO_TYPE (stmt_info) == cycle_phi_info_type > > + || STMT_VINFO_TYPE (stmt_info) == induc_vec_info_type > > + || STMT_VINFO_TYPE (stmt_info) == phi_info_type) > > + && SLP_TREE_CODE (node) != VEC_PERM_EXPR) > > { > > - /* For reduction and induction PHIs we do not use the > > - insertion iterator. */ > > + /* For PHI node vectorization we do not use the insertion iterator. > > */ > > si = gsi_none (); > > } > > else > > @@ -5277,7 +5376,7 @@ vect_remove_slp_scalar_calls (vec_info *vinfo, > > tree lhs; > > stmt_vec_info stmt_info; > > > > - if (SLP_TREE_DEF_TYPE (node) != vect_internal_def) > > + if (!node || SLP_TREE_DEF_TYPE (node) != vect_internal_def) > > return; > > > > if (visited.add (node)) > > @@ -5359,6 +5458,142 @@ vectorize_slp_instance_root_stmt (slp_tree node, > > slp_instance instance) > > gsi_replace (&rgsi, rstmt, true); > > } > > > > +struct slp_scc_info > > +{ > > + bool on_stack; > > + int dfs; > > + int lowlink; > > +}; > > + > > +/* Schedule the SLP INSTANCE doing a DFS walk and collecting SCCs. */ > > + > > +static void > > +vect_schedule_scc (vec_info *vinfo, slp_tree node, slp_instance instance, > > + hash_map<slp_tree, slp_scc_info> &scc_info, > > + int &maxdfs, vec<slp_tree> &stack) > > +{ > > + bool existed_p; > > + slp_scc_info *info = &scc_info.get_or_insert (node, &existed_p); > > + gcc_assert (!existed_p); > > + info->dfs = maxdfs; > > + info->lowlink = maxdfs; > > + info->on_stack = true; > > + maxdfs++; > > + stack.safe_push (node); > > + unsigned i; > > + slp_tree child; > > + > > + /* ??? We're keeping SLP_TREE_CHILDREN of externalized nodes. */ > > + if (SLP_TREE_DEF_TYPE (node) == vect_internal_def) > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > + { > > + if (!child) > > + continue; > > + slp_scc_info *child_info = scc_info.get (child); > > + if (!child_info) > > + { > > + vect_schedule_scc (vinfo, child, instance, scc_info, maxdfs, > > stack); > > + /* Recursion might have re-allocated the node. */ > > + info = scc_info.get (node); > > + child_info = scc_info.get (child); > > + info->lowlink = MIN (info->lowlink, child_info->lowlink); > > + } > > + else if (child_info->on_stack) > > + info->lowlink = MIN (info->lowlink, child_info->dfs); > > + } > > + if (info->lowlink != info->dfs) > > + return; > > + > > + /* Singleton. */ > > + if (stack.last () == node) > > + { > > + stack.pop (); > > + info->on_stack = false; > > + vect_schedule_slp_node (vinfo, node, instance); > > + return; > > + } > > + /* SCC. */ > > + int last_idx = stack.length () - 1; > > + while (stack[last_idx] != node) > > + last_idx--; > > + /* We can break the cycle at PHIs who have at least one child > > + code generated. Then we could re-start the DFS walk until > > + all nodes in the SCC are covered (we might have new entries > > + for only back-reachable nodes). But it's simpler to just > > + iterate and schedule those that are ready. */ > > + auto_vec<slp_tree, 4> phis_to_fixup; > > + unsigned todo = stack.length () - last_idx; > > + do > > + { > > + for (int idx = stack.length () - 1; idx >= last_idx; --idx) > > + { > > + slp_tree entry = stack[idx]; > > + if (!entry) > > + continue; > > + bool phi = (SLP_TREE_CODE (entry) != VEC_PERM_EXPR > > + && is_a <gphi *> (SLP_TREE_REPRESENTATIVE > > (entry)->stmt)); > > + bool ready = !phi; > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (entry), i, child) > > + if (!child) > > + { > > + gcc_assert (phi); > > + ready = true; > > + break; > > + } > > + else if (scc_info.get (child)->on_stack) > > + { > > + if (!phi) > > + { > > + ready = false; > > + break; > > + } > > + } > > + else > > + { > > + if (phi) > > + { > > + ready = true; > > + break; > > + } > > + } > > + if (ready) > > + { > > + vect_schedule_slp_node (vinfo, entry, instance); > > + scc_info.get (entry)->on_stack = false; > > + stack[idx] = NULL; > > + todo--; > > + if (phi) > > + phis_to_fixup.safe_push (entry); > > + } > > + } > > + } > > + while (todo != 0); > > + > > + /* Now fixup the backedge def of the vectorized PHIs in this SCC. */ > > + slp_tree phi_node; > > + FOR_EACH_VEC_ELT (phis_to_fixup, i, phi_node) > > + { > > + gphi *phi = as_a <gphi *> (SLP_TREE_REPRESENTATIVE (phi_node)->stmt); > > + edge_iterator ei; > > + edge e; > > + FOR_EACH_EDGE (e, ei, gimple_bb (phi)->preds) > > + { > > + unsigned dest_idx = e->dest_idx; > > + child = SLP_TREE_CHILDREN (phi_node)[dest_idx]; > > + if (!child || SLP_TREE_DEF_TYPE (child) != vect_internal_def) > > + continue; > > + /* Simply fill all args. */ > > + for (unsigned i = 0; i < SLP_TREE_VEC_STMTS (phi_node).length (); > > ++i) > > + add_phi_arg (as_a <gphi *> (SLP_TREE_VEC_STMTS (phi_node)[i]), > > + vect_get_slp_vect_def (child, i), > > + e, gimple_phi_arg_location (phi, dest_idx)); > > + } > > + } > > + > > + /* Pop the SCC. */ > > + stack.truncate (last_idx); > > +} > > + > > /* Generate vector code for SLP_INSTANCES in the loop/basic block. */ > > > > void > > @@ -5367,7 +5602,8 @@ vect_schedule_slp (vec_info *vinfo, vec<slp_instance> > > slp_instances) > > slp_instance instance; > > unsigned int i; > > > > - hash_set<slp_tree> visited; > > + hash_map<slp_tree, slp_scc_info> scc_info; > > + int maxdfs = 0; > > FOR_EACH_VEC_ELT (slp_instances, i, instance) > > { > > slp_tree node = SLP_INSTANCE_TREE (instance); > > @@ -5381,8 +5617,11 @@ vect_schedule_slp (vec_info *vinfo, > > vec<slp_instance> slp_instances) > > vect_print_slp_graph (MSG_NOTE, vect_location, > > SLP_INSTANCE_TREE (instance)); > > } > > - /* Schedule the tree of INSTANCE. */ > > - vect_schedule_slp_instance (vinfo, node, instance, visited); > > + /* Schedule the tree of INSTANCE, scheduling SCCs in a way to > > + have a PHI be the node breaking the cycle. */ > > + auto_vec<slp_tree> stack; > > + if (!scc_info.get (node)) > > + vect_schedule_scc (vinfo, node, instance, scc_info, maxdfs, stack); > > > > if (SLP_INSTANCE_ROOT_STMT (instance)) > > vectorize_slp_instance_root_stmt (node, instance); > > @@ -5398,25 +5637,6 @@ vect_schedule_slp (vec_info *vinfo, > > vec<slp_instance> slp_instances) > > stmt_vec_info store_info; > > unsigned int j; > > > > - /* For reductions set the latch values of the vectorized PHIs. */ > > - if (instance->reduc_phis > > - && STMT_VINFO_REDUC_TYPE (SLP_TREE_REPRESENTATIVE > > - (instance->reduc_phis)) != FOLD_LEFT_REDUCTION > > - && STMT_VINFO_REDUC_TYPE (SLP_TREE_REPRESENTATIVE > > - (instance->reduc_phis)) != EXTRACT_LAST_REDUCTION) > > - { > > - slp_tree slp_node = root; > > - slp_tree phi_node = instance->reduc_phis; > > - gphi *phi = as_a <gphi *> (SLP_TREE_SCALAR_STMTS > > (phi_node)[0]->stmt); > > - edge e = loop_latch_edge (gimple_bb (phi)->loop_father); > > - gcc_assert (SLP_TREE_VEC_STMTS (phi_node).length () > > - == SLP_TREE_VEC_STMTS (slp_node).length ()); > > - for (unsigned j = 0; j < SLP_TREE_VEC_STMTS (phi_node).length (); > > ++j) > > - add_phi_arg (as_a <gphi *> (SLP_TREE_VEC_STMTS (phi_node)[j]), > > - vect_get_slp_vect_def (slp_node, j), > > - e, gimple_phi_arg_location (phi, e->dest_idx)); > > - } > > - > > /* Remove scalar call stmts. Do not do this for basic-block > > vectorization as not all uses may be vectorized. > > ??? Why should this be necessary? DCE should be able to > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > > index 3575f25241f..7f0763f15c4 100644 > > --- a/gcc/tree-vect-stmts.c > > +++ b/gcc/tree-vect-stmts.c > > @@ -10745,7 +10745,8 @@ vect_analyze_stmt (vec_info *vinfo, > > || vectorizable_condition (vinfo, stmt_info, > > NULL, NULL, node, cost_vec) > > || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, > > node, > > - cost_vec)); > > + cost_vec) > > + || vectorizable_phi (vinfo, stmt_info, NULL, node)); > > } > > > > if (!ok) > > @@ -10885,6 +10886,11 @@ vect_transform_stmt (vec_info *vinfo, > > gcc_assert (done); > > break; > > > > + case phi_info_type: > > + done = vectorizable_phi (vinfo, stmt_info, &vec_stmt, slp_node); > > + gcc_assert (done); > > + break; > > + > > default: > > if (!STMT_VINFO_LIVE_P (stmt_info)) > > { > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > > index 0e08652ed10..b63dda31a08 100644 > > --- a/gcc/tree-vectorizer.c > > +++ b/gcc/tree-vectorizer.c > > @@ -684,7 +684,8 @@ vec_info::new_stmt_vec_info (gimple *stmt) > > STMT_VINFO_SLP_VECT_ONLY (res) = false; > > STMT_VINFO_VEC_STMTS (res) = vNULL; > > > > - if (gimple_code (stmt) == GIMPLE_PHI > > + if (is_a <loop_vec_info> (this) > > + && gimple_code (stmt) == GIMPLE_PHI > > && is_loop_header_bb_p (gimple_bb (stmt))) > > STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type; > > else > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index 9c55383a3ee..13a02cd0d0c 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -881,6 +881,7 @@ enum stmt_vec_info_type { > > type_conversion_vec_info_type, > > cycle_phi_info_type, > > lc_phi_info_type, > > + phi_info_type, > > loop_exit_ctrl_vec_info_type > > }; > > > > @@ -1939,6 +1940,7 @@ extern bool vect_transform_cycle_phi (loop_vec_info, > > stmt_vec_info, > > slp_tree, slp_instance); > > extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info, > > gimple **, slp_tree); > > +extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, > > slp_tree); > > extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code); > > extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, > > stmt_vector_for_cost *, > > -- > > 2.26.2