On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguent...@suse.de> wrote: > > On Fri, 30 Aug 2024, Manolis Tsamis wrote: > > > For more context please see > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. > > > > I've sent this as an RFC because it's hard to be sure that this was > > just uncovered by r15-2820-gab18785840d7b8 rather than being > > introduced by it. > > Still, to the best of my knowledge after reading through the code, > > this looked like a general issue to me and hence I'm proposing this > > fix. > > It's a general issue indeed. Note we do want to (re-)perform CSE > even between SLP instances - we're CSEing only internal defs, > not vect_{constant,external}_def at the moment. > Ok.
> The reason is that we want to put two instances accessing the > same stmts into the same partition (I think that would happen > also without the CSE, they'd just not share vectors). > > Now - in this case we're indirectly CSEing vect_{constant,external}_def > given we have > > _119 = {j_160, _75, i_159, _73}; > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > and > > _21 = {j_132, _75, i_131, _73}; > = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>; > > CSE doesn't honor scheduling constraints - it relies on SSA defs > being available (for the scalar lanes) but here, while the scalar > lanes are available the def they are picked from are not. > > So in general I don't agree with the patch - it's a workaround only. > Agreed with that, based on the context you provided it is indeed a workaround at best. > Note we can't reject this at the point we want to schedule, and we > do currently have no separate check of whether there is a valid > schedule - instead we rely on this by construction which means you > have a point in CSE violating this. > > One way to avoid this in CSE would be to not allow CSEing of > non-bijective permutes like { 1, 1, 3, 3 } that do not select all > of the input lanes (not selecting a lane that has a NULL > scalar def would be OK I think). > > The other obvious thing in this particular case is that the > permutes could all have been vector constructors, so instead of > > _119 = {j_160, _75, i_159, _73}; > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>; > > we'd have > > _80 = { _75, _75, _73, _73 }; > _150 = { j_160, j_160, i_159, i_159 }; > > I think that would be a profitable simplification but of course > it doesn't address the general issue. I'd have expected that > the SLP permute optimization would eventually do this but I > think it only ever considers bijective permutes. > Ok thanks for all the details, We'll have to look into a different solution based on these. Manolis > Richard. > > > > Thanks, > > Manolis > > > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsa...@vrull.eu> > > wrote: > > > > > > Currently we do SLP CSE after permute optimization using a single map > > > across > > > all SLP instances. These SLP instances can affect many different basic > > > blocks > > > and the cache may replace a statement in one block with one from another > > > block. > > > Because there are no further limitations and the blocks may be visited in > > > an > > > arbitrary order, this may lead to a statement being used in paths that > > > may not > > > be defined. This patch creates one map per SLP instance to address that. > > > > > > PR tree-optimization/116352 > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to > > > tree > > > map per SLP instance. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr116352.c: New test. > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu> > > > --- > > > > > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > > > gcc/tree-vect-slp.cc | 12 +++++++----- > > > 2 files changed, 25 insertions(+), 5 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c > > > b/gcc/testsuite/gcc.dg/pr116352.c > > > new file mode 100644 > > > index 00000000000..c427eff3c08 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr116352.c > > > @@ -0,0 +1,18 @@ > > > +/* PR tree-optimization/116352 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O3 -fchecking" } */ > > > + > > > +int a; > > > +float b, c; > > > +void l(int h, int f, int g, float *e) > > > +{ > > > + for (int m = 0; m < h; m++) > > > + { > > > + float i = 2 * b, j = 2 * c; > > > + if (a) { > > > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > > > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > > > + } else > > > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > > > + } > > > +} > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > index cfdf59ad386..a8836c2fb04 100644 > > > --- a/gcc/tree-vect-slp.cc > > > +++ b/gcc/tree-vect-slp.cc > > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > > > vect_optimize_slp_pass (vinfo).run (); > > > > > > /* Apply CSE again to nodes after permute optimization. */ > > > - scalar_stmts_to_slp_tree_map_t *bst_map > > > - = new scalar_stmts_to_slp_tree_map_t (); > > > - > > > for (auto inst : vinfo->slp_instances) > > > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > + { > > > + scalar_stmts_to_slp_tree_map_t *bst_map > > > + = new scalar_stmts_to_slp_tree_map_t (); > > > > > > - release_scalar_stmts_to_slp_tree_map (bst_map); > > > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > + > > > + release_scalar_stmts_to_slp_tree_map (bst_map); > > > + } > > > } > > > > > > /* Gather loads reachable from the individual SLP graph entries. */ > > > -- > > > 2.34.1 > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)