On Fri, 30 Aug 2024, Manolis Tsamis wrote: > On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguent...@suse.de> wrote: > > > > On Fri, 30 Aug 2024, Manolis Tsamis wrote: > > > > > For more context please see > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. > > > > > > I've sent this as an RFC because it's hard to be sure that this was > > > just uncovered by r15-2820-gab18785840d7b8 rather than being > > > introduced by it. > > > Still, to the best of my knowledge after reading through the code, > > > this looked like a general issue to me and hence I'm proposing this > > > fix. > > > > It's a general issue indeed. Note we do want to (re-)perform CSE > > even between SLP instances - we're CSEing only internal defs, > > not vect_{constant,external}_def at the moment. > > > Ok. > > > The reason is that we want to put two instances accessing the > > same stmts into the same partition (I think that would happen > > also without the CSE, they'd just not share vectors). > > > > Now - in this case we're indirectly CSEing vect_{constant,external}_def > > given we have > > > > _119 = {j_160, _75, i_159, _73}; > > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > > > and > > > > _21 = {j_132, _75, i_131, _73}; > > = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>; > > > > CSE doesn't honor scheduling constraints - it relies on SSA defs > > being available (for the scalar lanes) but here, while the scalar > > lanes are available the def they are picked from are not. > > > > So in general I don't agree with the patch - it's a workaround only. > > > Agreed with that, based on the context you provided it is indeed a > workaround at best. > > > Note we can't reject this at the point we want to schedule, and we > > do currently have no separate check of whether there is a valid > > schedule - instead we rely on this by construction which means you > > have a point in CSE violating this. > > > > One way to avoid this in CSE would be to not allow CSEing of > > non-bijective permutes like { 1, 1, 3, 3 } that do not select all > > of the input lanes (not selecting a lane that has a NULL > > scalar def would be OK I think). > > > > The other obvious thing in this particular case is that the > > permutes could all have been vector constructors, so instead of > > > > _119 = {j_160, _75, i_159, _73}; > > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>; > > > > we'd have > > > > _80 = { _75, _75, _73, _73 }; > > _150 = { j_160, j_160, i_159, i_159 }; > > > > I think that would be a profitable simplification but of course > > it doesn't address the general issue. I'd have expected that > > the SLP permute optimization would eventually do this but I > > think it only ever considers bijective permutes. > > > Ok thanks for all the details, We'll have to look into a different > solution based on these.
Btw, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115764 which is I think the very same (or similar) issue. Richard. > Manolis > > > Richard. > > > > > > > Thanks, > > > Manolis > > > > > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsa...@vrull.eu> > > > wrote: > > > > > > > > Currently we do SLP CSE after permute optimization using a single map > > > > across > > > > all SLP instances. These SLP instances can affect many different basic > > > > blocks > > > > and the cache may replace a statement in one block with one from > > > > another block. > > > > Because there are no further limitations and the blocks may be visited > > > > in an > > > > arbitrary order, this may lead to a statement being used in paths that > > > > may not > > > > be defined. This patch creates one map per SLP instance to address that. > > > > > > > > PR tree-optimization/116352 > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to > > > > tree > > > > map per SLP instance. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/pr116352.c: New test. > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu> > > > > --- > > > > > > > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > > > > gcc/tree-vect-slp.cc | 12 +++++++----- > > > > 2 files changed, 25 insertions(+), 5 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c > > > > b/gcc/testsuite/gcc.dg/pr116352.c > > > > new file mode 100644 > > > > index 00000000000..c427eff3c08 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr116352.c > > > > @@ -0,0 +1,18 @@ > > > > +/* PR tree-optimization/116352 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fchecking" } */ > > > > + > > > > +int a; > > > > +float b, c; > > > > +void l(int h, int f, int g, float *e) > > > > +{ > > > > + for (int m = 0; m < h; m++) > > > > + { > > > > + float i = 2 * b, j = 2 * c; > > > > + if (a) { > > > > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > > > > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > > > > + } else > > > > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > > > > + } > > > > +} > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > > index cfdf59ad386..a8836c2fb04 100644 > > > > --- a/gcc/tree-vect-slp.cc > > > > +++ b/gcc/tree-vect-slp.cc > > > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > > > > vect_optimize_slp_pass (vinfo).run (); > > > > > > > > /* Apply CSE again to nodes after permute optimization. */ > > > > - scalar_stmts_to_slp_tree_map_t *bst_map > > > > - = new scalar_stmts_to_slp_tree_map_t (); > > > > - > > > > for (auto inst : vinfo->slp_instances) > > > > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > > + { > > > > + scalar_stmts_to_slp_tree_map_t *bst_map > > > > + = new scalar_stmts_to_slp_tree_map_t (); > > > > > > > > - release_scalar_stmts_to_slp_tree_map (bst_map); > > > > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > > + > > > > + release_scalar_stmts_to_slp_tree_map (bst_map); > > > > + } > > > > } > > > > > > > > /* Gather loads reachable from the individual SLP graph entries. */ > > > > -- > > > > 2.34.1 > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)