On Fri, 30 Aug 2024, Manolis Tsamis wrote:

> On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Fri, 30 Aug 2024, Manolis Tsamis wrote:
> >
> > > For more context please see
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11.
> > >
> > > I've sent this as an RFC because it's hard to be sure that this was
> > > just uncovered by r15-2820-gab18785840d7b8 rather than being
> > > introduced by it.
> > > Still, to the best of my knowledge after reading through the code,
> > > this looked like a general issue to me and hence I'm proposing this
> > > fix.
> >
> > It's a general issue indeed.  Note we do want to (re-)perform CSE
> > even between SLP instances - we're CSEing only internal defs,
> > not vect_{constant,external}_def at the moment.
> >
> Ok.
> 
> > The reason is that we want to put two instances accessing the
> > same stmts into the same partition (I think that would happen
> > also without the CSE, they'd just not share vectors).
> >
> > Now - in this case we're indirectly CSEing vect_{constant,external}_def
> > given we have
> >
> >   _119 = {j_160, _75, i_159, _73};
> >   _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>;
> >
> > and
> >
> >  _21 = {j_132, _75, i_131, _73};
> >      = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>;
> >
> > CSE doesn't honor scheduling constraints - it relies on SSA defs
> > being available (for the scalar lanes) but here, while the scalar
> > lanes are available the def they are picked from are not.
> >
> > So in general I don't agree with the patch - it's a workaround only.
> >
> Agreed with that, based on the context you provided it is indeed a
> workaround at best.
> 
> > Note we can't reject this at the point we want to schedule, and we
> > do currently have no separate check of whether there is a valid
> > schedule - instead we rely on this by construction which means you
> > have a point in CSE violating this.
> >
> > One way to avoid this in CSE would be to not allow CSEing of
> > non-bijective permutes like { 1, 1, 3, 3 } that do not select all
> > of the input lanes (not selecting a lane that has a NULL
> > scalar def would be OK I think).
> >
> > The other obvious thing in this particular case is that the
> > permutes could all have been vector constructors, so instead of
> >
> >   _119 = {j_160, _75, i_159, _73};
> >   _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>;
> >   _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>;
> >
> > we'd have
> >
> >   _80 = { _75, _75, _73, _73 };
> >   _150 = { j_160, j_160, i_159, i_159 };
> >
> > I think that would be a profitable simplification but of course
> > it doesn't address the general issue.  I'd have expected that
> > the SLP permute optimization would eventually do this but I
> > think it only ever considers bijective permutes.
> >
> Ok thanks for all the details, We'll have to look into a different
> solution based  on these.

Btw, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115764 which is
I think the very same (or similar) issue.

Richard.

> Manolis
> 
> > Richard.
> >
> >
> > > Thanks,
> > > Manolis
> > >
> > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsa...@vrull.eu> 
> > > wrote:
> > > >
> > > > Currently we do SLP CSE after permute optimization using a single map 
> > > > across
> > > > all SLP instances. These SLP instances can affect many different basic 
> > > > blocks
> > > > and the cache may replace a statement in one block with one from 
> > > > another block.
> > > > Because there are no further limitations and the blocks may be visited 
> > > > in an
> > > > arbitrary order, this may lead to a statement being used in paths that 
> > > > may not
> > > > be defined. This patch creates one map per SLP instance to address that.
> > > >
> > > >         PR tree-optimization/116352
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to 
> > > > tree
> > > >         map per SLP instance.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/pr116352.c: New test.
> > > >
> > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > > > ---
> > > >
> > > >  gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++
> > > >  gcc/tree-vect-slp.cc            | 12 +++++++-----
> > > >  2 files changed, 25 insertions(+), 5 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/pr116352.c
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c 
> > > > b/gcc/testsuite/gcc.dg/pr116352.c
> > > > new file mode 100644
> > > > index 00000000000..c427eff3c08
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr116352.c
> > > > @@ -0,0 +1,18 @@
> > > > +/* PR tree-optimization/116352 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fchecking" } */
> > > > +
> > > > +int a;
> > > > +float b, c;
> > > > +void l(int h, int f, int g, float *e)
> > > > +{
> > > > +  for (int m = 0; m < h; m++)
> > > > +  {
> > > > +    float i = 2 * b, j = 2 * c;
> > > > +    if (a) {
> > > > +      e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f);
> > > > +      e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f);
> > > > +    } else
> > > > +      e[m*4 + 0] = f * 0.5f + g * 0.5f;
> > > > +  }
> > > > +}
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index cfdf59ad386..a8836c2fb04 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo)
> > > >    vect_optimize_slp_pass (vinfo).run ();
> > > >
> > > >    /* Apply CSE again to nodes after permute optimization.  */
> > > > -  scalar_stmts_to_slp_tree_map_t *bst_map
> > > > -    = new scalar_stmts_to_slp_tree_map_t ();
> > > > -
> > > >    for (auto inst : vinfo->slp_instances)
> > > > -    vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst));
> > > > +    {
> > > > +      scalar_stmts_to_slp_tree_map_t *bst_map
> > > > +       = new scalar_stmts_to_slp_tree_map_t ();
> > > >
> > > > -  release_scalar_stmts_to_slp_tree_map (bst_map);
> > > > +      vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst));
> > > > +
> > > > +      release_scalar_stmts_to_slp_tree_map (bst_map);
> > > > +    }
> > > >  }
> > > >
> > > >  /* Gather loads reachable from the individual SLP graph entries.  */
> > > > --
> > > > 2.34.1
> > > >
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to