On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 30 Aug 2024, Manolis Tsamis wrote:
>
> > For more context please see
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11.
> >
> > I've sent this as an RFC because it's hard to be sure that this was
> > just uncovered by r15-2820-gab18785840d7b8 rather than being
> > introduced by it.
> > Still, to the best of my knowledge after reading through the code,
> > this looked like a general issue to me and hence I'm proposing this
> > fix.
>
> It's a general issue indeed.  Note we do want to (re-)perform CSE
> even between SLP instances - we're CSEing only internal defs,
> not vect_{constant,external}_def at the moment.
>
Ok.

> The reason is that we want to put two instances accessing the
> same stmts into the same partition (I think that would happen
> also without the CSE, they'd just not share vectors).
>
> Now - in this case we're indirectly CSEing vect_{constant,external}_def
> given we have
>
>   _119 = {j_160, _75, i_159, _73};
>   _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>;
>
> and
>
>  _21 = {j_132, _75, i_131, _73};
>      = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>;
>
> CSE doesn't honor scheduling constraints - it relies on SSA defs
> being available (for the scalar lanes) but here, while the scalar
> lanes are available the def they are picked from are not.
>
> So in general I don't agree with the patch - it's a workaround only.
>
Agreed with that, based on the context you provided it is indeed a
workaround at best.

> Note we can't reject this at the point we want to schedule, and we
> do currently have no separate check of whether there is a valid
> schedule - instead we rely on this by construction which means you
> have a point in CSE violating this.
>
> One way to avoid this in CSE would be to not allow CSEing of
> non-bijective permutes like { 1, 1, 3, 3 } that do not select all
> of the input lanes (not selecting a lane that has a NULL
> scalar def would be OK I think).
>
> The other obvious thing in this particular case is that the
> permutes could all have been vector constructors, so instead of
>
>   _119 = {j_160, _75, i_159, _73};
>   _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>;
>   _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>;
>
> we'd have
>
>   _80 = { _75, _75, _73, _73 };
>   _150 = { j_160, j_160, i_159, i_159 };
>
> I think that would be a profitable simplification but of course
> it doesn't address the general issue.  I'd have expected that
> the SLP permute optimization would eventually do this but I
> think it only ever considers bijective permutes.
>
Ok thanks for all the details, We'll have to look into a different
solution based  on these.

Manolis

> Richard.
>
>
> > Thanks,
> > Manolis
> >
> > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsa...@vrull.eu> 
> > wrote:
> > >
> > > Currently we do SLP CSE after permute optimization using a single map 
> > > across
> > > all SLP instances. These SLP instances can affect many different basic 
> > > blocks
> > > and the cache may replace a statement in one block with one from another 
> > > block.
> > > Because there are no further limitations and the blocks may be visited in 
> > > an
> > > arbitrary order, this may lead to a statement being used in paths that 
> > > may not
> > > be defined. This patch creates one map per SLP instance to address that.
> > >
> > >         PR tree-optimization/116352
> > >
> > > gcc/ChangeLog:
> > >
> > >         * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to 
> > > tree
> > >         map per SLP instance.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/pr116352.c: New test.
> > >
> > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > > ---
> > >
> > >  gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++
> > >  gcc/tree-vect-slp.cc            | 12 +++++++-----
> > >  2 files changed, 25 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr116352.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c 
> > > b/gcc/testsuite/gcc.dg/pr116352.c
> > > new file mode 100644
> > > index 00000000000..c427eff3c08
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr116352.c
> > > @@ -0,0 +1,18 @@
> > > +/* PR tree-optimization/116352 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fchecking" } */
> > > +
> > > +int a;
> > > +float b, c;
> > > +void l(int h, int f, int g, float *e)
> > > +{
> > > +  for (int m = 0; m < h; m++)
> > > +  {
> > > +    float i = 2 * b, j = 2 * c;
> > > +    if (a) {
> > > +      e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f);
> > > +      e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f);
> > > +    } else
> > > +      e[m*4 + 0] = f * 0.5f + g * 0.5f;
> > > +  }
> > > +}
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index cfdf59ad386..a8836c2fb04 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo)
> > >    vect_optimize_slp_pass (vinfo).run ();
> > >
> > >    /* Apply CSE again to nodes after permute optimization.  */
> > > -  scalar_stmts_to_slp_tree_map_t *bst_map
> > > -    = new scalar_stmts_to_slp_tree_map_t ();
> > > -
> > >    for (auto inst : vinfo->slp_instances)
> > > -    vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst));
> > > +    {
> > > +      scalar_stmts_to_slp_tree_map_t *bst_map
> > > +       = new scalar_stmts_to_slp_tree_map_t ();
> > >
> > > -  release_scalar_stmts_to_slp_tree_map (bst_map);
> > > +      vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst));
> > > +
> > > +      release_scalar_stmts_to_slp_tree_map (bst_map);
> > > +    }
> > >  }
> > >
> > >  /* Gather loads reachable from the individual SLP graph entries.  */
> > > --
> > > 2.34.1
> > >
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to