On Fri, 1 Nov 2019, Joel Hutton wrote:

> On 30/10/2019 13:49, Richard Biener wrote:
>  >>
>  >>          * expr.c (store_constructor): Add case for constructor of 
> vectors.
>  > Why do you need this?  The vectorizer already creates such CTORs.  Any
>  > testcase that you can show?
>  >
>  > typedef long v2di __attribute__((vector_size(16)));
>  > v2di v;
>  > void
>  > foo()
>  > {
>  >    v = (v2di){v[1], v[0]};
>  > }
> There is a special case for single element vectors, where the vector 
> mode and
> the element mode are the same.
> I've changed this slightly, as your solution of using VECTOR_MODE_P didn't
> work for my case where:
>    emode = E_DImode
>    eltmode = E_DImode
> On aarch64. As E_DImode is not a vector mode.
> Based on some checks I found in verify_gimple, I set the type of the 
> replacement
> constructor to the same as the original constructor, as verify_gimple 
> seems to
> expect flattened types. i.e. a vector of vectors of ints should have 
> type vector
> of ints.

Huh.  On aarch64 for the above testcase I see mode V2DImode and
emode = eltmode = DImode.  That's just a regular CTOR with
non-vector elements so not sure why you need any adjustment at all
here?

It looks like your vectorization of the CTOR introduces a
V2DImode CTOR of vector(1) long elements which incidentially
have DImode.  That's because somehow V2DI vectorization isn't
profitable but V1DI one is.  In the end it's a noop transform
but the testcase shows that for V2DI vectorization we fail
to cost the CTOR in the scalar cost computation (you have to
include the pattern root there I guess).

Technically we feed valid GIMPLE to the expander so we indeed
shouldn't ICE.  So I'd like to have the earlier keying on
vec_vec_init_p match the later assert we run into, thus
sth like

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 277765)
+++ gcc/expr.c  (working copy)
@@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
            && n_elts.is_constant (&const_n_elts))
          {
            machine_mode emode = eltmode;
+           bool vector_typed_elts_p = false;
 
            if (CONSTRUCTOR_NELTS (exp)
                && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 
0)->value))
@@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
                                      * TYPE_VECTOR_SUBPARTS (etype),
                                      n_elts));
                emode = TYPE_MODE (etype);
+               vector_typed_elts_p = true;
              }
            icode = convert_optab_handler (vec_init_optab, mode, emode);
            if (icode != CODE_FOR_nothing)
              {
                unsigned int n = const_n_elts;
 
-               if (emode != eltmode)
+               if (vector_typed_elts_p)
                  {
                    n = CONSTRUCTOR_NELTS (exp);
                    vec_vec_init_p = true;

> 
>  > What could be done is "compact" the
>  > operands to vectorize to only contain SSA names, try to SLP
>  > match and vectorize them and then combine the vectorized result
>  > with the constant elements.
>  >
>  > Not worth doing I guess unless it's sth regular like
>  >
>  >  { a, b, c, d, 0, 0, 0, 0 }
>  >
>  > or so.  But this can be handled as followup if necessary.
>  >
> I agree, it should be possible to support this in future patches.
> 
> 
>  > +         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
>  > stmt_info->stmt\
>  > +                                                 : NULL;
>  >
>  > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
>  >
>  > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
>  >                 stmt_vec_info use_stmt_info = vinfo->lookup_stmt
>  > (use_stmt);
>  >                 if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
>  >                   {
>  > +                   /* Check this is not a constructor that will be
>  > vectorized
>  > +                      away.  */
>  > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
>  > (use_stmt_info))
>  > +                       continue;
>  >                     (*life)[i] = true;
>  >
>  > ... which you then simply mark as PURE_SLP_STMT where we call
>  > vect_mark_slp_stmts in vect_slp_analyze_bb_1.
> Done.
> 
> 
>  > I see you have the TYPE_VECTOR_SUBPARTS check still at transform
>  > stage in vectorize_slp_instance_root_stmt, please simply move
>  > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
>  > where you have the other rejects (non-SSA names in the ctor).
>  >
> Done.
> 
> 
>  >
>  > +bool
>  > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
>  > +{
>  > +
>  >
>  > extra newline
> Fixed.
> 
> 
>  > +  /* For vector constructors, the constructor stmt that the SLP tree is
>  > built
>  > +     from, NULL otherwise.  */
>  > +  gimple *root_stmt;
>  >
>  > as said, make this a stmt_vec_info
> Done.
> 
> 
>  > +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
>  >
>  > the stmt replacing can be commonized between == 1 and > 1 cases.
> Done.
> 
> 
>  > +
>  > +      /* Vectorize the instance root.  */
>  > +      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
>  > +         && SLP_TREE_VEC_STMTS (node).exists ())
>  > +       if (!vectorize_slp_instance_root_stmt (node, instance))
>  > +         {
>  >
>  > instance->root == node is always true.  Likewise
>  > SLP_TREE_VEC_STMTS (node).exists ().
> Done.
> 
> 
>  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
>  >        if (is_a <loop_vec_info> (vinfo))
>  >
>  > the ChangeLog doesn't mention this.  I guess this isn't necessary
>  > unless you fail vectorizing late which you shouldn't.
>  >
> It's necessary to avoid:
> 
>          removing stmts twice for constructors of the form {_1,_1,_1,_1}
>          removing stmts that define ssa names used elsewhere (which 
> previously wasn't a consideration because the scalar_stmts were stores 
> to memory, not assignments to ssa names)

OK, but the code you are patching is just supposed to remove stores
from the final scalar stmt seeds - it doesn't expect any loads there
which is probably what happens.  I'd instead add a

   /* Do not CSE the stmts feeding a CTOR, they might have uses
      outside of the vectorized stmts.  */
   if (SLP_INSTANCE_ROOT_STMT (instance))
     continue;

before the loop over SLP_TREE_SCALAR_STMTS.

> Updated changelog (and patch)

+         if (!subparts.is_constant () || !(subparts.to_constant ()
+                                           == CONSTRUCTOR_NELTS (rhs)))
+           continue;

can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))

+/* Vectorize the instance root.  Return success or failure.  */
+
+void

since it's now void the comment has to be adjusted.

+      vect_schedule_slp_instance (node,
                                  instance, bst_map);

this now fits in one line.

OK with those changes.

Thanks,
Richard.

> 2019-10-31  Joel Hutton  <joel.hut...@arm.com>
> 
>          * expr.c (store_constructor): Modify to handle single element 
> vectors.
>          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for 
> vector constructors.
>          (vect_slp_check_for_constructors): New function.
>          (vect_slp_analyze_bb_1): Call new function to check for vector 
> constructors.
>          (vectorize_slp_instance_root_stmt): New function.
>          (vect_schedule_slp): Call new function to vectorize root stmt 
> of vector constructors.
>          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-31  Joel Hutton  <joel.hut...@arm.com>
> 
>          * gcc.dg/vect/bb-slp-40.c: New test.
>          * gcc.dg/vect/bb-slp-41.c: New test.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to