Re: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]

Richard Biener Mon, 10 Jun 2024 05:55:02 -0700

On Mon, 10 Jun 2024, Manolis Tsamis wrote:

> On Wed, Jun 5, 2024 at 11:07 AM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Tue, 4 Jun 2024, Manolis Tsamis wrote:
> >
> > > This change adds a function that checks for SLP nodes with multiple 
> > > occurrences
> > > of the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the 
> > > node
> > > so that there are no duplicates. A vec_perm is then introduced to 
> > > recreate the
> > > original ordering. These duplicates can appear due to how two_operators 
> > > nodes
> > > are handled, and they prevent vectorization in some cases.
> >
> > So the trick is that when we have two operands we elide duplicate lanes
> > so we can do discovery for a single combined operand instead which we
> > then decompose into the required two again.  That's a nice one.
> >
> > But as implemented this will fail SLP discovery if the combined operand
> > fails discovery possibly because of divergence in downstream defs.  That
> > is, it doesn't fall back to separate discovery.  I suspect the situation
> > of duplicate lanes isn't common but then I would also suspect that
> > divergence _is_ common.
> >
> That's a good point; I checked out and at least for the x264 testcase
> provided SLP discovery succeeds in both cases but in one case
> vectorization fails later on due to the unsupported load permutations
> among others.
> I think that's what Tamar also mentioned and it makes it hard to
> decide whether to apply the pattern based on if discovery fails.
> 
> > The discovery code is already quite complex with the way it possibly
> > swaps operands of lanes, fitting in this as another variant to try (first)
> > is likely going to be a bit awkward.  A way out might be to split the
> > function or to make the re-try in the caller which could indicate whether
> > to apply this pattern trick or not.  That said - can you try to get
> > data on how often the trick applies and discovery succeeds and how
> > often discovery fails but discovery would suceed without applying the
> > pattern (say, on SPEC)?
> >
> I checked out SPEC and this pattern only triggers on x264 and in that
> case discovery succeeds. So we don't have any data on the pattern
> applying but discovery failing.
> 
> > I also suppose instead of hardcoding three patterns for a fixed
> > size it should be possible to see there's
> > only (at most) half unique lanes in both operands (and one less in one
> > operand if the number of lanes is odd) and compute the un-swizzling lane
> > permutes during this discovery, removing the need of the explicit enum
> > and open-coding each case?
> >
> Yes, that's a fair point. I will change that in the next iteration.
> 
> > Another general note is that trying (and then undo on fail) such ticks
> > eats at the discovery limit we have in place to avoid exponential run-off
> > in exactly this degenerate cases.
> >
> 
> So, most importantly, the points you and Tamar mentioned got me
> thinking about the transformation again, why it is useful and when it
> applies.
> In this initial implementation I tried to make this independant from
> the two_operators logic and apply it when possible, which brings up
> all these issues about discovery and usefulness of the pattern in
> general.
> E.g. If we had just [a, b, a, b] + [c, d, c, d] without two_operators
> I sort of doubt it would be worth it to apply the transformation in
> most cases (except of course if it enables vectorization, but as I
> understand it it is hard to tell when that happens).
> On the other hand, if we know that we're dealing with two_operators
> nodes then the argument changes, as we know that we'll duplicate these
> nodes.
> 
> In turn, it may be best to try to see this as a 'two_operators
> lowering strategy' improvement instead of a generic rearrangement
> pattern.
> Specifically for x264, we're given code like
> 
> int t0 = s0 + s1;
> int t1 = s0 - s1;
> int t2 = s2 + s3;
> int t3 = s2 - s3;
> 
> and currently we lower that to VEC_PERM<(A + B), (A - B)>(...) with A
> = [s0, s0, s2, s2], B = [s1, s1, s3, s3] which doesn't work very well
> (due to element duplication).
> With this patch we do VEC_PERM<(A + B), (A - B)>(...) with A =
> VEC_PERM<C, C>(...), B = VEC_PERM<C, C>(...),  C = [s0, s1, s2, s3]
> instead which works good.


So I'm not sure I buy the argument that [a, b, a, b] + [c, d, c, d]
is much different from the two-operators version.
[a, b, a, b] + [c, d, c, d] is one of the variants (the plus) we
discover.

Isn't this really about us stupidly trying to force the use of
a larger vector rather than doing the two-operator handling by
doing

VEC_PERM <{s0, s2} + {s1, s3}, {s0, s2} - {s1, s3}, { 0, 2, 1, 3 }>

and thus recursing with smaller group size and then "interleaving"
the result?  Which might only work well in exactly the case where
we have the same number of + and -.

Of course in both forms 'A' and 'B' (or the smaller vectors) should
be SLP nodes re-used.

> But it is obvious that there are other strategies to lower this too
> and they may be even better (by taking advantage of the fact that we
> know we're dealing with a two_operators node *and* have duplicate
> elements).
> For example doing VEC_PERM<(A + B), (A - B)>(...) with A = [s0, s1,
> s2, s3] and B = VEC_PERM<A, A>(1, 0, 3, 2) looks interesting too and
> is only possible because we combine two_operators and rearrangement.
> 
> Do you believe that narrowing this to a "two_operators lowering
> improvement" makes more sense and addresses at least some of the
> issues mentioned?
> I'm currently testing to see the code that we generate with other
> strategies and will reach out once I have new results.

I think doing this in the if (two_operators) code in vect_build_slp_tree_2
only where we can take advantage of knowing that SLP build for the
original A and B children succeeded would be good.  It should side-step
the discovery cost issue.  If you do your inter-mixing scheme you'll
still need to re-discover but FAILing should be possible easily
by falling back to the already built children nodes?

Richard.

> Thanks,
> Manolis
> 
> > Thanks,
> > Richard.
> >
> > > This targets the vectorization of the SPEC2017 x264 pixel_satd functions.
> > > In some processors a larger than 10% improvement on x264 has been 
> > > observed.
> > >
> > > See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
> > >
> > > gcc/ChangeLog:
> > >
> > >       * tree-vect-slp.cc (enum slp_oprnd_pattern): new enum for 
> > > rearrangement
> > >       patterns.
> > >       (try_rearrange_oprnd_info): Detect if a node corresponds to one of 
> > > the
> > >       patterns.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/aarch64/vect-slp-two-operator.c: New test.
> > >
> > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > > ---
> > >
> > >  .../aarch64/vect-slp-two-operator.c           |  42 ++++
> > >  gcc/tree-vect-slp.cc                          | 234 ++++++++++++++++++
> > >  2 files changed, 276 insertions(+)
> > >  create mode 100644 
> > > gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c 
> > > b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > > new file mode 100644
> > > index 00000000000..2db066a0b6e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect 
> > > -fdump-tree-vect-details" } */
> > > +
> > > +typedef unsigned char uint8_t;
> > > +typedef unsigned int uint32_t;
> > > +
> > > +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> > > +    int t0 = s0 + s1;\
> > > +    int t1 = s0 - s1;\
> > > +    int t2 = s2 + s3;\
> > > +    int t3 = s2 - s3;\
> > > +    d0 = t0 + t2;\
> > > +    d1 = t1 + t3;\
> > > +    d2 = t0 - t2;\
> > > +    d3 = t1 - t3;\
> > > +}
> > > +
> > > +static uint32_t abs2( uint32_t a )
> > > +{
> > > +    uint32_t s = ((a>>15)&0x10001)*0xffff;
> > > +    return (a+s)^s;
> > > +}
> > > +
> > > +void sink(uint32_t tmp[4][4]);
> > > +
> > > +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
> > > i_pix2 )
> > > +{
> > > +    uint32_t tmp[4][4];
> > > +    int sum = 0;
> > > +    for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
> > > +    {
> > > +        uint32_t a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
> > > +        uint32_t a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
> > > +        uint32_t a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
> > > +        uint32_t a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
> > > +        HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], 
> > > a0,a1,a2,a3 );
> > > +    }
> > > +    sink(tmp);
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } 
> > > */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index bf1f467f53f..e395db0e185 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "tree-vectorizer.h"
> > >  #include "langhooks.h"
> > >  #include "gimple-walk.h"
> > > +#include "gimple-pretty-print.h"
> > >  #include "dbgcnt.h"
> > >  #include "tree-vector-builder.h"
> > >  #include "vec-perm-indices.h"
> > > @@ -1829,6 +1830,141 @@ vect_slp_build_two_operator_nodes (slp_tree perm, 
> > > tree vectype,
> > >    SLP_TREE_CHILDREN (perm).quick_push (child2);
> > >  }
> > >
> > > +enum slp_oprnd_pattern
> > > +{
> > > +  SLP_OPRND_PATTERN_NONE,
> > > +  SLP_OPRND_PATTERN_ABAB,
> > > +  SLP_OPRND_PATTERN_AABB,
> > > +  SLP_OPRND_PATTERN_ABBA
> > > +};
> > > +
> > > +/* Check if OPRNDS_INFO has duplicated nodes that correspond to a 
> > > predefined
> > > +   pattern described by SLP_OPRND_PATTERN and return it.  */
> > > +
> > > +static int
> > > +try_rearrange_oprnd_info (vec<slp_oprnd_info> &oprnds_info, unsigned 
> > > group_size)
> > > +{
> > > +  unsigned i;
> > > +  slp_oprnd_info info;
> > > +
> > > +  if (oprnds_info.length () != 2 || group_size % 4 != 0)
> > > +    return SLP_OPRND_PATTERN_NONE;
> > > +
> > > +  if (!oprnds_info[0]->def_stmts[0]
> > > +      || !is_a<gassign *> (oprnds_info[0]->def_stmts[0]->stmt))
> > > +    return SLP_OPRND_PATTERN_NONE;
> > > +
> > > +  enum tree_code code
> > > +    = gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt);
> > > +  FOR_EACH_VEC_ELT (oprnds_info, i, info)
> > > +    for (unsigned int j = 0; j < group_size; j += 1)
> > > +      {
> > > +     if (!info->def_stmts[j]
> > > +         || !is_a<gassign *> (info->def_stmts[j]->stmt)
> > > +         || STMT_VINFO_DATA_REF (info->def_stmts[j]))
> > > +       return SLP_OPRND_PATTERN_NONE;
> > > +     /* Don't mix different operations.  */
> > > +     if (gimple_assign_rhs_code (info->def_stmts[j]->stmt) != code)
> > > +       return SLP_OPRND_PATTERN_NONE;
> > > +      }
> > > +
> > > +  if (gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt)
> > > +      != gimple_assign_rhs_code (oprnds_info[1]->def_stmts[0]->stmt))
> > > +    return SLP_OPRND_PATTERN_NONE;
> > > +
> > > +  int pattern = SLP_OPRND_PATTERN_NONE;
> > > +  FOR_EACH_VEC_ELT (oprnds_info, i, info)
> > > +    for (unsigned int j = 0; j < group_size; j += 4)
> > > +      {
> > > +     int cur_pattern = SLP_OPRND_PATTERN_NONE;
> > > +     /* Check for an ABAB... pattern.  */
> > > +     if ((info->def_stmts[j] == info->def_stmts[j + 2])
> > > +         && (info->def_stmts[j + 1] == info->def_stmts[j + 3])
> > > +         && (info->def_stmts[j] != info->def_stmts[j + 1]))
> > > +       cur_pattern = SLP_OPRND_PATTERN_ABAB;
> > > +     /* Check for an AABB... pattern.  */
> > > +     else if ((info->def_stmts[j] == info->def_stmts[j + 1])
> > > +              && (info->def_stmts[j + 2] == info->def_stmts[j + 3])
> > > +              && (info->def_stmts[j] != info->def_stmts[j + 2]))
> > > +       cur_pattern = SLP_OPRND_PATTERN_AABB;
> > > +     /* Check for an ABBA... pattern.  */
> > > +     else if ((info->def_stmts[j] == info->def_stmts[j + 3])
> > > +              && (info->def_stmts[j + 1] == info->def_stmts[j + 2])
> > > +              && (info->def_stmts[j] != info->def_stmts[j + 1]))
> > > +       cur_pattern = SLP_OPRND_PATTERN_ABBA;
> > > +     /* Unrecognised pattern.  */
> > > +     else
> > > +       return SLP_OPRND_PATTERN_NONE;
> > > +
> > > +     if (pattern == SLP_OPRND_PATTERN_NONE)
> > > +       pattern = cur_pattern;
> > > +     /* Multiple patterns detected.  */
> > > +     else if (cur_pattern != pattern)
> > > +       return SLP_OPRND_PATTERN_NONE;
> > > +      }
> > > +
> > > +  gcc_checking_assert (pattern != SLP_OPRND_PATTERN_NONE);
> > > +
> > > +  if (dump_enabled_p ())
> > > +    {
> > > +      if (pattern == SLP_OPRND_PATTERN_ABAB)
> > > +     dump_printf (MSG_NOTE, "ABAB");
> > > +      else if (pattern == SLP_OPRND_PATTERN_AABB)
> > > +     dump_printf (MSG_NOTE, "AABB");
> > > +      else if (pattern == SLP_OPRND_PATTERN_ABBA)
> > > +     dump_printf (MSG_NOTE, "ABBA");
> > > +      dump_printf (MSG_NOTE, " pattern detected.\n");
> > > +    }
> > > +
> > > +  if (pattern == SLP_OPRND_PATTERN_ABAB || pattern == 
> > > SLP_OPRND_PATTERN_ABBA)
> > > +    for (unsigned int j = 0; j < group_size; j += 4)
> > > +      {
> > > +     /* Given oprnd[0] -> A1, B1, A1, B1, A2, B2, A2, B2, ...
> > > +        Given oprnd[1] -> C1, D1, C1, D1, C2, D2, C2, D2, ...
> > > +        Create a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...  */
> > > +     oprnds_info[0]->def_stmts[j+2] = oprnds_info[1]->def_stmts[j];
> > > +     oprnds_info[0]->ops[j+2] = oprnds_info[1]->ops[j];
> > > +     oprnds_info[0]->def_stmts[j+3] = oprnds_info[1]->def_stmts[j+1];
> > > +     oprnds_info[0]->ops[j+3] = oprnds_info[1]->ops[j+1];
> > > +      }
> > > +  else if (pattern == SLP_OPRND_PATTERN_AABB)
> > > +    for (unsigned int j = 0; j < group_size; j += 4)
> > > +      {
> > > +     /* Given oprnd[0] -> A1, A1, B1, B1, A2, A2, B2, B2, ...
> > > +        Given oprnd[1] -> C1, C1, D1, D1, C2, C2, D2, D2, ...
> > > +        Create a single node -> A1, C1, B1, D1, A2, C2, B2, D2, ...  */
> > > +
> > > +     /* The ordering here is at least to some extent arbitrary.
> > > +        A generilized version needs to use some explicit ordering.  */
> > > +     oprnds_info[0]->def_stmts[j+1] = oprnds_info[1]->def_stmts[j];
> > > +     oprnds_info[0]->ops[j+1] = oprnds_info[1]->ops[j];
> > > +     oprnds_info[0]->def_stmts[j+2] = oprnds_info[0]->def_stmts[j+2];
> > > +     oprnds_info[0]->ops[j+2] = oprnds_info[0]->ops[j+2];
> > > +     oprnds_info[0]->def_stmts[j+3] = oprnds_info[1]->def_stmts[j+2];
> > > +     oprnds_info[0]->ops[j+3] = oprnds_info[1]->ops[j+2];
> > > +      }
> > > +
> > > +  if (dump_enabled_p ())
> > > +    {
> > > +      dump_printf (MSG_NOTE, "Recurse with:\n");
> > > +      for (unsigned int j = 0; j < group_size; j++)
> > > +     {
> > > +       dump_printf (MSG_NOTE, "  ");
> > > +       print_gimple_stmt (dump_file, oprnds_info[0]->def_stmts[j]->stmt, 
> > > 0);
> > > +     }
> > > +    }
> > > +
> > > +  /* Since we've merged the two nodes in one, make the second one a copy 
> > > of
> > > +     the first.  */
> > > +  for (unsigned int j = 0; j < group_size; j++)
> > > +    {
> > > +      oprnds_info[1]->def_stmts[j] = oprnds_info[0]->def_stmts[j];
> > > +      oprnds_info[1]->ops[j] = oprnds_info[0]->ops[j];
> > > +    }
> > > +
> > > +  return pattern;
> > > +}
> > > +
> > >  /* Recursively build an SLP tree starting from NODE.
> > >     Fail (and return a value not equal to zero) if def-stmts are not
> > >     isomorphic, require data permutation or are of unsupported types of
> > > @@ -2409,6 +2545,10 @@ out:
> > >
> > >    stmt_info = stmts[0];
> > >
> > > +  int rearrange_pattern = SLP_OPRND_PATTERN_NONE;
> > > +  if (is_a<gassign *> (stmt_info->stmt))
> > > +    rearrange_pattern = try_rearrange_oprnd_info (oprnds_info, 
> > > group_size);
> > > +
> > >    /* Create SLP_TREE nodes for the definition node/s.  */
> > >    FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
> > >      {
> > > @@ -2669,6 +2809,100 @@ fail:
> > >    *tree_size += this_tree_size + 1;
> > >    *max_nunits = this_max_nunits;
> > >
> > > +  /* If we applied any rearrangmenets then we need to reconstruct the 
> > > original
> > > +     elements with an additional permutation layer.  */
> > > +  if (rearrange_pattern != SLP_OPRND_PATTERN_NONE)
> > > +    {
> > > +      slp_tree one =  new _slp_tree;
> > > +      slp_tree two = new _slp_tree;
> > > +      SLP_TREE_DEF_TYPE (one) = vect_internal_def;
> > > +      SLP_TREE_DEF_TYPE (two) = vect_internal_def;
> > > +      SLP_TREE_VECTYPE (one) = vectype;
> > > +      SLP_TREE_VECTYPE (two) = vectype;
> > > +      SLP_TREE_CHILDREN (one).safe_splice (children);
> > > +      SLP_TREE_CHILDREN (two).safe_splice (children);
> > > +
> > > +      SLP_TREE_CODE (one) = VEC_PERM_EXPR;
> > > +      SLP_TREE_CODE (two) = VEC_PERM_EXPR;
> > > +      SLP_TREE_REPRESENTATIVE (one) = stmts[0];
> > > +      SLP_TREE_REPRESENTATIVE (two) = stmts[2];
> > > +      lane_permutation_t &perm_one = SLP_TREE_LANE_PERMUTATION (one);
> > > +      lane_permutation_t &perm_two = SLP_TREE_LANE_PERMUTATION (two);
> > > +
> > > +      if (rearrange_pattern == SLP_OPRND_PATTERN_ABAB)
> > > +     {
> > > +        /* Given a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...
> > > +           Create node "one" -> A1, B1, A1, B1, A2, B2, A2, B2, ...
> > > +           Create node "two" -> C1, D1, C1, D1, C2, D2, C2, D2, ...  */
> > > +
> > > +       for (unsigned int j = 0; j < group_size; j += 4)
> > > +         {
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > > +
> > > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +         }
> > > +     }
> > > +      else if (rearrange_pattern == SLP_OPRND_PATTERN_AABB)
> > > +     {
> > > +        /* Given a single node -> A1, C1, B1, D1, A2, C2, B2, D2, ...
> > > +           Create node "one" -> A1, A1, B1, B1, A2, A2, B2, B2, ...
> > > +           Create node "two" -> C1, C1, D1, D1, C2, C2, D2, D2, ...  */
> > > +
> > > +       for (unsigned int j = 0; j < group_size; j += 4)
> > > +         {
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +           perm_one.safe_push (std::make_pair (0, j + 2));
> > > +           perm_one.safe_push (std::make_pair (0, j + 2));
> > > +
> > > +           perm_two.safe_push (std::make_pair (0, j + 1));
> > > +           perm_two.safe_push (std::make_pair (0, j + 1));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +         }
> > > +     }
> > > +      else if (rearrange_pattern == SLP_OPRND_PATTERN_ABBA)
> > > +     {
> > > +        /* Given a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...
> > > +           Create node "one" -> A1, B1, B1, A1, A2, B2, B2, A2, ...
> > > +           Create node "two" -> C1, D1, D1, C1, C2, D2, D2, C2, ...  */
> > > +
> > > +       for (unsigned int j = 0; j < group_size; j += 4)
> > > +         {
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > > +
> > > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > > +         }
> > > +     }
> > > +
> > > +      slp_tree child;
> > > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (two), i, child)
> > > +       SLP_TREE_REF_COUNT (child)++;
> > > +
> > > +      node = vect_create_new_slp_node (node, stmts, 2);
> > > +      SLP_TREE_VECTYPE (node) = vectype;
> > > +      SLP_TREE_CHILDREN (node).quick_push (one);
> > > +      SLP_TREE_CHILDREN (node).quick_push (two);
> > > +
> > > +      SLP_TREE_LANES (one) = stmts.length ();
> > > +      SLP_TREE_LANES (two) = stmts.length ();
> > > +
> > > +      children.truncate (0);
> > > +      children.safe_splice (SLP_TREE_CHILDREN (node));
> > > +    }
> > > +
> > >    if (two_operators)
> > >      {
> > >        /* ???  We'd likely want to either cache in bst_map sth like
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]

Reply via email to