RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-30 Thread Joel Hutton via Gcc-patches
> We can go with a private vect_gimple_build function until we sort out the API
> issue to unblock Tamar (I'll reply to Richards reply with further thoughts on
> this)
> 

Done.

> > Similarly are you ok with the use of gimple_extract_op? I would lean
> towards using it as it is cleaner, but I don't have strong feelings.
> 
> I don't like using gimple_extract_op here, I think I outlined a variant that 
> is
> even shorter.
> 

Done.

Updated patches attached, bootstrapped and regression tested on aarch64.

Tomorrow is my last working day at Arm, so it will likely be Andre that commits 
this/addresses any further comments.



0001-Refactor-to-allow-internal_fn-s.patch
Description: 0001-Refactor-to-allow-internal_fn-s.patch


0002-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-09 Thread Joel Hutton via Gcc-patches
> Before I make any changes, I'd like to check we're all on the same page.
> 
> richi, are you ok with the gimple_build function, perhaps with a different
> name if you are concerned with overloading? we could use gimple_ch_build
> or gimple_code_helper_build?
> 
> Similarly are you ok with the use of gimple_extract_op? I would lean towards
> using it as it is cleaner, but I don't have strong feelings.
> 
> Joel

Ping. Just looking for some confirmation before I rework this patch. It would 
be good to get some agreement on this as Tamar is blocked on this patch.

Joel



> -Original Message-
> From: Joel Hutton
> Sent: 07 June 2022 10:02
> To: Richard Sandiford 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> Thanks Richard,
> 
> > I thought the potential problem with the above is that gimple_build is
> > a folding interface, so in principle it's allowed to return an
> > existing SSA_NAME set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> 
> Before I make any changes, I'd like to check we're all on the same page.
> 
> richi, are you ok with the gimple_build function, perhaps with a different
> name if you are concerned with overloading? we could use gimple_ch_build
> or gimple_code_helper_build?
> 
> Similarly are you ok with the use of gimple_extract_op? I would lean towards
> using it as it is cleaner, but I don't have strong feelings.
> 
> Joel
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: 07 June 2022 09:18
> > To: Joel Hutton 
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> > internal_fns
> >
> > Joel Hutton  writes:
> > >> > Patches attached. They already incorporated the .cc rename, now
> > >> > rebased to be after the change to tree.h
> > >>
> > >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info
> *vinfo,
> > >>2, oprnd, half_type, unprom, vectype);
> > >>
> > >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> > >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > >> - oprnd[0], oprnd[1]);
> > >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> > >> oprnd[1]);
> > >>
> > >>
> > >> you should be able to do without the new gimple_build overload by
> > >> using
> > >>
> > >>gimple_seq stmts = NULL;
> > >>gimple_build (, wide_code, itype, oprnd[0], oprnd[1]);
> > >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> > >>
> > >> because 'gimple_build' is an existing API.
> > >
> > > Done.
> > >
> > > The gimple_build overload was at the request of Richard Sandiford, I
> > assume removing it is ok with you Richard S?
> > > From Richard Sandiford:
> > >> For example, I think we should hide this inside a new:
> > >>
> > >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> > >>
> > >> that works directly on code_helper, similarly to the new
> > >> code_helper gimple_build interfaces.
> >
> > I thought the potential problem with the above is that gimple_build is
> > a folding interface, so in principle it's allowed to return an
> > existing SSA_NAME set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> >
> > Of course, the hope is that there wouldn't still be such folding
> > opportunities at this stage, but I don't think it's guaranteed
> > (especially with options fuzzing).
> >
> > Sind I was mentioned :-) ...
> >
> > Could you run the patch through contrib/check_GNU_style.py?
> > There seem to be a few long lines.
> >
> > > +  if (res_op.code.is_tree_code ())
> >
> > Do you need this is_tree_code ()?  These comparisons…
> >
> > > +  {
> > > +  widen_arith = (code == WIDEN_PLUS_EXPR
> > > +  || code == WIDEN_MINUS_EXPR
> > > +  || code == WIDEN_MULT_EXPR
> > > +  || code == WIDEN_LSHIFT_EXPR);
> >
> > …ought to be safe unconditionally.
> >
> > > + }
> > > +  else
> > > +  widen_arith = false;
> > > +
> > > +  if (!widen_arith
> > > +  && !CONVERT_EXPR_CODE_P (code)
> > > +  && code != FIX_TRUNC_EXPR
> > > +  && code != FLOAT_EXPR)
> > > +return false;
> > >
> > >/* Check types of lhs and rhs.  */
> > > -  scalar_dest = gimple_assign_lhs (stmt);
> > > +  scalar_dest = gimple_get_lhs (stmt);
> > >lhs_type = TREE_TYPE (scalar_dest);
> > >vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> > >
> > > @@ -4938,10 +4951,14 @@ vectorizable_conversion (vec_info *vinfo,
> > >
> > >if (op_type == binary_op)
> > >  {
> > > -  gcc_assert (code == WIDEN_MULT_EXPR || code ==
> > WIDEN_LSHIFT_EXPR
> > > -   || code == WIDEN_PLUS_EXPR || code ==
> > WIDEN_MINUS_EXPR);
> > > +  gcc_assert (code == WIDEN_MULT_EXPR
> > > +   || 

RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-07 Thread Joel Hutton via Gcc-patches
Thanks Richard,

> I thought the potential problem with the above is that gimple_build is a
> folding interface, so in principle it's allowed to return an existing SSA_NAME
> set by an existing statement (or even a constant).
> I think in this context we do need to force a new statement to be created.

Before I make any changes, I'd like to check we're all on the same page.

richi, are you ok with the gimple_build function, perhaps with a different name 
if you are concerned with overloading? we could use gimple_ch_build or 
gimple_code_helper_build?

Similarly are you ok with the use of gimple_extract_op? I would lean towards 
using it as it is cleaner, but I don't have strong feelings.

Joel

> -Original Message-
> From: Richard Sandiford 
> Sent: 07 June 2022 09:18
> To: Joel Hutton 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> Joel Hutton  writes:
> >> > Patches attached. They already incorporated the .cc rename, now
> >> > rebased to be after the change to tree.h
> >>
> >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> >>2, oprnd, half_type, unprom, vectype);
> >>
> >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> >> - oprnd[0], oprnd[1]);
> >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> >> oprnd[1]);
> >>
> >>
> >> you should be able to do without the new gimple_build overload by
> >> using
> >>
> >>gimple_seq stmts = NULL;
> >>gimple_build (, wide_code, itype, oprnd[0], oprnd[1]);
> >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> >>
> >> because 'gimple_build' is an existing API.
> >
> > Done.
> >
> > The gimple_build overload was at the request of Richard Sandiford, I
> assume removing it is ok with you Richard S?
> > From Richard Sandiford:
> >> For example, I think we should hide this inside a new:
> >>
> >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >>
> >> that works directly on code_helper, similarly to the new code_helper
> >> gimple_build interfaces.
> 
> I thought the potential problem with the above is that gimple_build is a
> folding interface, so in principle it's allowed to return an existing SSA_NAME
> set by an existing statement (or even a constant).
> I think in this context we do need to force a new statement to be created.
> 
> Of course, the hope is that there wouldn't still be such folding opportunities
> at this stage, but I don't think it's guaranteed (especially with options
> fuzzing).
> 
> Sind I was mentioned :-) ...
> 
> Could you run the patch through contrib/check_GNU_style.py?
> There seem to be a few long lines.
> 
> > +  if (res_op.code.is_tree_code ())
> 
> Do you need this is_tree_code ()?  These comparisons…
> 
> > +  {
> > +  widen_arith = (code == WIDEN_PLUS_EXPR
> > +|| code == WIDEN_MINUS_EXPR
> > +|| code == WIDEN_MULT_EXPR
> > +|| code == WIDEN_LSHIFT_EXPR);
> 
> …ought to be safe unconditionally.
> 
> > + }
> > +  else
> > +  widen_arith = false;
> > +
> > +  if (!widen_arith
> > +  && !CONVERT_EXPR_CODE_P (code)
> > +  && code != FIX_TRUNC_EXPR
> > +  && code != FLOAT_EXPR)
> > +return false;
> >
> >/* Check types of lhs and rhs.  */
> > -  scalar_dest = gimple_assign_lhs (stmt);
> > +  scalar_dest = gimple_get_lhs (stmt);
> >lhs_type = TREE_TYPE (scalar_dest);
> >vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> >
> > @@ -4938,10 +4951,14 @@ vectorizable_conversion (vec_info *vinfo,
> >
> >if (op_type == binary_op)
> >  {
> > -  gcc_assert (code == WIDEN_MULT_EXPR || code ==
> WIDEN_LSHIFT_EXPR
> > - || code == WIDEN_PLUS_EXPR || code ==
> WIDEN_MINUS_EXPR);
> > +  gcc_assert (code == WIDEN_MULT_EXPR
> > + || code == WIDEN_LSHIFT_EXPR
> > + || code == WIDEN_PLUS_EXPR
> > + || code == WIDEN_MINUS_EXPR);
> >
> > -  op1 = gimple_assign_rhs2 (stmt);
> > +
> > +  op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
> > +gimple_call_arg (stmt, 0);
> >tree vectype1_in;
> >if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 1,
> >, _op1, [1], _in)) […] @@
> -12181,7
> > +12235,6 @@ supportable_widening_operation (vec_info *vinfo,
> >return false;
> >  }
> >
> > -
> >  /* Function supportable_narrowing_operation
> >
> > Check whether an operation represented by the code CODE is a
> 
> Seems like a spurious change.
> 
> > @@ -12205,7 +12258,7 @@ supportable_widening_operation (vec_info
> > *vinfo,  bool  supportable_narrowing_operation (enum tree_code code,
> >  tree vectype_out, tree vectype_in,
> > -enum tree_code *code1, int *multi_step_cvt,
> 

RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-06 Thread Joel Hutton via Gcc-patches
> > Patches attached. They already incorporated the .cc rename, now
> > rebased to be after the change to tree.h
> 
> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
>2, oprnd, half_type, unprom, vectype);
> 
>tree var = vect_recog_temp_ssa_var (itype, NULL);
> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> - oprnd[0], oprnd[1]);
> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> oprnd[1]);
> 
> 
> you should be able to do without the new gimple_build overload
> by using
> 
>gimple_seq stmts = NULL;
>gimple_build (, wide_code, itype, oprnd[0], oprnd[1]);
>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> 
> because 'gimple_build' is an existing API.

Done.

The gimple_build overload was at the request of Richard Sandiford, I assume 
removing it is ok with you Richard S?
>From Richard Sandiford:
> For example, I think we should hide this inside a new:
> 
>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> 
> that works directly on code_helper, similarly to the new code_helper 
> gimple_build interfaces.




> 
> 
> -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> +  if (gimple_get_lhs (stmt) == NULL_TREE ||
> +  TREE_CODE(gimple_get_lhs (stmt)) != SSA_NAME)
>  return false;
> 
> || go to the next line, space after TREE_CODE
> 

Done.

> +  bool widen_arith = false;
> +  gimple_match_op res_op;
> +  if (!gimple_extract_op (stmt, _op))
> +return false;
> +  code = res_op.code;
> +  op_type = res_op.num_ops;
> +
> +  if (is_gimple_assign (stmt))
> +  {
> +  widen_arith = (code == WIDEN_PLUS_EXPR
> +|| code == WIDEN_MINUS_EXPR
> +|| code == WIDEN_MULT_EXPR
> +|| code == WIDEN_LSHIFT_EXPR);
> + }
> +  else
> +  widen_arith = gimple_call_flags (stmt) & ECF_WIDEN;
> 
> there seem to be formatting issues.  Also shouldn't you check
> if (res_op.code.is_tree_code ()) instead if is_gimple_assign?
> I also don't like the ECF_WIDEN "trick", just do as with the
> tree codes and explicitely enumerate widening ifns here.
> 

Done. I've set widen_arith to False for the first patch as the second patch 
introduces the widening ifns.

> gimple_extract_op is a bit heavy-weight as well, so maybe
> instead simply do
> 
>   if (is_gimple_assign (stmt))
> {
>   code = gimple_assign_rhs_code (stmt);
> ...
> }
>   else if (gimple_call_internal_p (stmt))
> {
>   code = gimple_call_internal_fn (stmt);
> ...
> }
>   else
> return false;

The patch was originally written as above, it was changed to use 
gimple_extract_op at the request of Richard Sandiford. I prefer 
gimple_extract_op as it's more compact, but I don't have strong feelings. If 
the Richards can agree on either version I'm happy.


>From Richard Sandiford:
> > +  if (is_gimple_assign (stmt))
> > +  {
> > +code_or_ifn = gimple_assign_rhs_code (stmt);  }  else
> > +code_or_ifn = gimple_call_combined_fn (stmt);
> 
> It might be possible to use gimple_extract_op here (only recently added).
> This would also provide the number of operands directly, instead of 
> needing "op_type".  It would also provide an array of operands.


> 
> +  code_helper c1=MAX_TREE_CODES, c2=MAX_TREE_CODES;
> 
> spaces before/after '='
> 

Done.

> @@ -12061,13 +12105,16 @@ supportable_widening_operation (vec_info
> *vinfo,
>if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
>  std::swap (c1, c2);
> 
> +
>if (code == FIX_TRUNC_EXPR)
>  {
> 
> unnecessary whitespace change.
> 
Fixed.

> diff --git a/gcc/tree.h b/gcc/tree.h
> index
> f84958933d51144bb6ce7cc41eca5f7f06814550..e51e34c051d9b91d1c02a4b2
> fefdb2b15606a36f
> 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -92,6 +92,10 @@ public:
>bool is_fn_code () const { return rep < 0; }
>bool is_internal_fn () const;
>bool is_builtin_fn () const;
> +  enum tree_code as_tree_code () const { return is_tree_code () ?
> +(tree_code)* this : MAX_TREE_CODES; }
> +  combined_fn as_fn_code () const { return is_fn_code () ? (combined_fn)
> *this
> +: CFN_LAST;}
> 
> hmm, the other as_* functions we have are not member functions.
> Also this semantically differs from the tree_code () conversion
> operator (that one was supposed to be "cheap").  The existing
> as_internal_fn for example is documented as being valid only if
> the code is actually an internal fn.  I see you are introducing
> the new function as convenience to get a "safe" not-a-X value,
> so maybe they should be called safe_as_tree_code () instead?
> 
SGTM. Done

> 
>int get_rep () const { return rep; }
>bool operator== (const code_helper ) { return rep == other.rep; }
>bool operator!= (const code_helper ) { return rep != other.rep; }
> @@ -6657,6 +6661,54 @@ extern unsigned fndecl_dealloc_argno (tree);
> if nonnull, set the second argument to the referenced enclosing
> 

RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-05-31 Thread Joel Hutton via Gcc-patches
> Can you post an updated patch (after the .cc renaming, and code_helper
> now already moved to tree.h).
> 
> Thanks,
> Richard.

Patches attached. They already incorporated the .cc rename, now rebased to be 
after the change to tree.h

Joel


0001-Refactor-to-allow-internal_fn-s.patch
Description: 0001-Refactor-to-allow-internal_fn-s.patch


0002-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-05-25 Thread Joel Hutton via Gcc-patches
Ping!

Just checking there is still interest in this. I'm assuming you've been busy 
with release.

Joel

> -Original Message-
> From: Joel Hutton
> Sent: 13 April 2022 16:53
> To: Richard Sandiford 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns
> 
> Hi all,
> 
> These patches refactor the widening patterns in vect-patterns to use
> internal_fn instead of tree_codes.
> 
> Sorry about the delay, some changes to master made it a bit messier.
> 
> Bootstrapped and regression tested on aarch64.
> 
> Joel
> 
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > index 854cbcff390..4a8ea67e62f 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -1245,7 +1245,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > > static gimple *  vect_recog_widen_op_pattern (vec_info *vinfo,
> > >stmt_vec_info last_stmt_info, tree *type_out,
> > > -  tree_code orig_code, tree_code wide_code,
> > > +  tree_code orig_code, code_helper
> > wide_code_or_ifn,
> >
> > I think it'd be better to keep the original “wide_code” name and try
> > to remove as many places as possible in which switching based on
> > tree_code or internal_fn is necessary.  The recent gimple-match.h
> > patches should help with that, but more routines might be needed.
> 
> Done.
> 
> > > @@ -1309,8 +1310,16 @@ vect_recog_widen_op_pattern (vec_info
> *vinfo,
> > >  2, oprnd, half_type, unprom, vectype);
> > >
> > >tree var = vect_recog_temp_ssa_var (itype, NULL);
> > > -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > > -   oprnd[0], oprnd[1]);
> > > +  gimple *pattern_stmt;
> > > +  if (wide_code_or_ifn.is_tree_code ())
> > > +pattern_stmt = gimple_build_assign (var, wide_code_or_ifn,
> > > + oprnd[0], oprnd[1]);
> > > +  else
> > > +{
> > > +  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);
> > > +  pattern_stmt = gimple_build_call_internal (fn, 2, oprnd[0], 
> > > oprnd[1]);
> > > +  gimple_call_set_lhs (pattern_stmt, var);
> > > +}
> >
> > For example, I think we should hide this inside a new:
> >
> >   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >
> > that works directly on code_helper, similarly to the new code_helper
> > gimple_build interfaces.
> 
> Done.
> 
> > > @@ -4513,14 +4513,16 @@ vect_gen_widened_results_half (vec_info
> > *vinfo, enum tree_code code,
> > >tree new_temp;
> > >
> > >/* Generate half of the widened result:  */
> > > -  gcc_assert (op_type == TREE_CODE_LENGTH (code));
> > >if (op_type != binary_op)
> > >  vec_oprnd1 = NULL;
> > > -  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0,
> > vec_oprnd1);
> > > +  if (ch.is_tree_code ())
> > > +new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0,
> > vec_oprnd1);
> > > +  else
> > > +new_stmt = gimple_build_call_internal (as_internal_fn
> > > + ((combined_fn)
> > ch),
> > > +2, vec_oprnd0, vec_oprnd1);
> >
> > Similarly here.  I guess the combined_fn/internal_fn path will also
> > need to cope with null trailing operands, for consistency with the tree_code
> one.
> >
> 
> Done.
> 
> > > @@ -4744,31 +4747,28 @@ vectorizable_conversion (vec_info *vinfo,
> > >&& ! vec_stmt)
> > >  return false;
> > >
> > > -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> > > -  if (!stmt)
> > > +  gimple* stmt = stmt_info->stmt;
> > > +  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
> > >  return false;
> > >
> > > -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> > > -return false;
> > > +  if (is_gimple_assign (stmt))
> > > +  {
> > > +code_or_ifn = gimple_assign_rhs_code (stmt);  }  else
> > > +code_or_ifn = gimple_call_combined_fn (stmt);
> >
> > It might be possible to use gimple_extract_op here (only recently added).
> > This would also provide the number of operands directly, instead of
> > needing “op_type”.  It would also provide an array of operands.
> >
> 
> Done.
> 
> > > -  code = gimple_assign_rhs_code (stmt);
> > > -  if (!CONVERT_EXPR_CODE_P (code)
> > > -  && code != FIX_TRUNC_EXPR
> > > -  && code != FLOAT_EXPR
> > > -  && code != WIDEN_PLUS_EXPR
> > > -  && code != WIDEN_MINUS_EXPR
> > > -  && code != WIDEN_MULT_EXPR
> > > -  && code != WIDEN_LSHIFT_EXPR)
> >
> > Is it safe to drop this check independently of parts 2 and 3?
> > (Genuine question, haven't checked in detail.)
> 
> It requires the parts 2 and 3. I've moved that change into this first patch.
> 
> > > @@ -4784,7 +4784,8 @@ vectorizable_conversion (vec_info *vinfo,
> > >  }
> > >
> > >rhs_type = TREE_TYPE (op0);
> > > -  if ((code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> > > +  if ((code_or_ifn.is_tree_code () && 

[vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-04-13 Thread Joel Hutton via Gcc-patches
Hi all,

These patches refactor the widening patterns in vect-patterns to use 
internal_fn instead of tree_codes.

Sorry about the delay, some changes to master made it a bit messier.

Bootstrapped and regression tested on aarch64.

Joel

> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> > 854cbcff390..4a8ea67e62f 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1245,7 +1245,7 @@ vect_recog_sad_pattern (vec_info *vinfo,  static
> > gimple *  vect_recog_widen_op_pattern (vec_info *vinfo,
> >  stmt_vec_info last_stmt_info, tree *type_out,
> > -tree_code orig_code, tree_code wide_code,
> > +tree_code orig_code, code_helper
> wide_code_or_ifn,
> 
> I think it'd be better to keep the original “wide_code” name and try to
> remove as many places as possible in which switching based on tree_code
> or internal_fn is necessary.  The recent gimple-match.h patches should
> help with that, but more routines might be needed.

Done.

> > @@ -1309,8 +1310,16 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> >2, oprnd, half_type, unprom, vectype);
> >
> >tree var = vect_recog_temp_ssa_var (itype, NULL);
> > -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > - oprnd[0], oprnd[1]);
> > +  gimple *pattern_stmt;
> > +  if (wide_code_or_ifn.is_tree_code ())
> > +pattern_stmt = gimple_build_assign (var, wide_code_or_ifn,
> > +   oprnd[0], oprnd[1]);
> > +  else
> > +{
> > +  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);
> > +  pattern_stmt = gimple_build_call_internal (fn, 2, oprnd[0], 
> > oprnd[1]);
> > +  gimple_call_set_lhs (pattern_stmt, var);
> > +}
> 
> For example, I think we should hide this inside a new:
> 
>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> 
> that works directly on code_helper, similarly to the new code_helper
> gimple_build interfaces.

Done.

> > @@ -4513,14 +4513,16 @@ vect_gen_widened_results_half (vec_info
> *vinfo, enum tree_code code,
> >tree new_temp;
> >
> >/* Generate half of the widened result:  */
> > -  gcc_assert (op_type == TREE_CODE_LENGTH (code));
> >if (op_type != binary_op)
> >  vec_oprnd1 = NULL;
> > -  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0,
> vec_oprnd1);
> > +  if (ch.is_tree_code ())
> > +new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0,
> vec_oprnd1);
> > +  else
> > +new_stmt = gimple_build_call_internal (as_internal_fn ((combined_fn)
> ch),
> > +  2, vec_oprnd0, vec_oprnd1);
> 
> Similarly here.  I guess the combined_fn/internal_fn path will also need
> to cope with null trailing operands, for consistency with the tree_code one.
> 

Done.

> > @@ -4744,31 +4747,28 @@ vectorizable_conversion (vec_info *vinfo,
> >&& ! vec_stmt)
> >  return false;
> >
> > -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> > -  if (!stmt)
> > +  gimple* stmt = stmt_info->stmt;
> > +  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
> >  return false;
> >
> > -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> > -return false;
> > +  if (is_gimple_assign (stmt))
> > +  {
> > +code_or_ifn = gimple_assign_rhs_code (stmt);
> > +  }
> > +  else
> > +code_or_ifn = gimple_call_combined_fn (stmt);
> 
> It might be possible to use gimple_extract_op here (only recently added).
> This would also provide the number of operands directly, instead of
> needing “op_type”.  It would also provide an array of operands.
> 

Done.

> > -  code = gimple_assign_rhs_code (stmt);
> > -  if (!CONVERT_EXPR_CODE_P (code)
> > -  && code != FIX_TRUNC_EXPR
> > -  && code != FLOAT_EXPR
> > -  && code != WIDEN_PLUS_EXPR
> > -  && code != WIDEN_MINUS_EXPR
> > -  && code != WIDEN_MULT_EXPR
> > -  && code != WIDEN_LSHIFT_EXPR)
> 
> Is it safe to drop this check independently of parts 2 and 3?  (Genuine
> question, haven't checked in detail.)

It requires the parts 2 and 3. I've moved that change into this first patch.

> > @@ -4784,7 +4784,8 @@ vectorizable_conversion (vec_info *vinfo,
> >  }
> >
> >rhs_type = TREE_TYPE (op0);
> > -  if ((code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> > +  if ((code_or_ifn.is_tree_code () && code_or_ifn != FIX_TRUNC_EXPR
> > +   && code_or_ifn != FLOAT_EXPR)
> 
> I don't think we want the is_tree_code condition here.  The existing
> != should work.
> 

Done.

> > @@ -11856,13 +11888,13 @@ supportable_widening_operation (vec_info
> *vinfo,
> >if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
> >  std::swap (c1, c2);
> >
> > -  if (code == FIX_TRUNC_EXPR)
> > +  if (code_or_ifn == FIX_TRUNC_EXPR)
> >  {
> >/* The signedness is determined from output operand.  */
> >optab1 = optab_for_tree_code (c1, 

RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
> +  if (ot_plus == unknown_optab
> +  || ot_minus == unknown_optab
> +  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) ==
> CODE_FOR_nothing
> +  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) ==
> + CODE_FOR_nothing)
>  return false;
> 
> Won't optab_handler just return CODE_FOR_nothing for unknown_optab?

I was taking the check used in directly_supported_p

return (optab != unknown_optab$
&& optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing);$

> Anyway, I think best would be to write it as:
>   if (!target_supports_op_p (step_vectype, PLUS_EXPR, optab_default)
>   || !target_supports_op_p (step_vectype, MINUS_EXPR, optab_default))
> return false;
Looks good to me.

Patch attached.

Tests running on gcc-11 on aarch64. 

Ok for 11 once tests come back?


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch


RE: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-14 Thread Joel Hutton via Gcc-patches
Bootstrapped and regression tested on releases/gcc-11 on aarch64.

Ok for 11?

Previous commit broke build as it relied on directly_supported_p which
is not in 11. This reworks to avoid using directly_supported_p.

gcc/ChangeLog:

  PR bootstrap/103688
  * tree-vect-loop.c (vectorizable_induction): Rework to avoid
directly_supported_p

From: Joel Hutton 
Sent: 13 December 2021 15:02
To: Richard Biener ; gcc-patches@gcc.gnu.org; 
Tobias Burnus ; Richard Sandiford 

Cc: Richard Biener 
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

My mistake, reworked patch. Tests are still running.

From: Richard Biener 
mailto:richard.guent...@gmail.com>>
Sent: 13 December 2021 14:47
To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org> 
mailto:gcc-patches@gcc.gnu.org>>; Tobias Burnus 
mailto:tob...@codesourcery.com>>; Joel Hutton 
mailto:joel.hut...@arm.com>>; Richard Sandiford 
mailto:richard.sandif...@arm.com>>
Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
Richard Biener mailto:rguent...@suse.de>>
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

On December 13, 2021 3:27:50 PM GMT+01:00, Tobias Burnus 
mailto:tob...@codesourcery.com>> wrote:
>Hi Joel,
>
>your patch fails here with:
>
>../../repos/gcc-11-commit/gcc/tree-vect-loop.c:8000:8: error:
>'directly_supported_p' was not declared in this scope
>  8000 |   if (!directly_supported_p (PLUS_EXPR, step_vectype)
>   |^~~~
>
>And "git grep" shows that this is only present in:
>
>gcc/tree-vect-loop.c:  if (!directly_supported_p (PLUS_EXPR, step_vectype)
>gcc/tree-vect-loop.c:  || !directly_supported_p (MINUS_EXPR,
>step_vectype))
>
>That's different on mainline, which offers that function.

Just as a reminder, backports need regular bootstrap and regtest validation on 
the respective branches.

Richard.

>Tobias
>
>On 10.12.21 14:24, Joel Hutton via Gcc-patches wrote:
>> ok for backport to 11?
>> 
>> From: Richard Sandiford 
>> mailto:richard.sandif...@arm.com>>
>> Sent: 10 December 2021 10:22
>> To: Joel Hutton mailto:joel.hut...@arm.com>>
>> Cc: GCC Patches mailto:gcc-patches@gcc.gnu.org>>; 
>> Richard Biener mailto:rguent...@suse.de>>
>> Subject: Re: pr103523: Check for PLUS/MINUS support
>>
>> Joel Hutton mailto:joel.hut...@arm.com>> writes:
>>> Hi all,
>>>
>>> This is to address pr103523.
>>>
>>> bootstrapped and regression tested on aarch64.
>>>
>>> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
>>> PR103523 is an ICE on valid code:
>>>
>>> void d(float *a, float b, int c) {
>>>  float e;
>>>  for (; c; c--, e += b)
>>>a[c] = e;
>>> }
>>>
>>> This is due to not checking for PLUS_EXPR support, which is missing in
>>> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
>>> for support in vectorizable_induction.
>>>
>>> gcc/ChangeLog:
>>>
>>>  PR tree-optimization/PR103523
>> The bugzilla hook expects: PR tree-optimization/103523
>>
>>>  * tree-vect-loop.c (vectorizable_induction): Check for
>>>  PLUS_EXPR/MINUS_EXPR support.
>> OK, thanks.
>>
>> Richard
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955


0001-vect-loop-fix-build.patch
Description: 0001-vect-loop-fix-build.patch


Re: GCC 11 backport does not build (no "directly_supported_p") - was: Re: pr103523: Check for PLUS/MINUS support

2021-12-13 Thread Joel Hutton via Gcc-patches
My mistake, reworked patch. Tests are still running.

From: Richard Biener 
Sent: 13 December 2021 14:47
To: gcc-patches@gcc.gnu.org ; Tobias Burnus 
; Joel Hutton ; Richard Sandiford 

Cc: GCC Patches ; Richard Biener 
Subject: Re: GCC 11 backport does not build (no "directly_supported_p") - was: 
Re: pr103523: Check for PLUS/MINUS support

On December 13, 2021 3:27:50 PM GMT+01:00, Tobias Burnus 
 wrote:
>Hi Joel,
>
>your patch fails here with:
>
>../../repos/gcc-11-commit/gcc/tree-vect-loop.c:8000:8: error:
>‘directly_supported_p’ was not declared in this scope
>  8000 |   if (!directly_supported_p (PLUS_EXPR, step_vectype)
>   |^~~~
>
>And "git grep" shows that this is only present in:
>
>gcc/tree-vect-loop.c:  if (!directly_supported_p (PLUS_EXPR, step_vectype)
>gcc/tree-vect-loop.c:  || !directly_supported_p (MINUS_EXPR,
>step_vectype))
>
>That's different on mainline, which offers that function.

Just as a reminder, backports need regular bootstrap and regtest validation on 
the respective branches.

Richard.

>Tobias
>
>On 10.12.21 14:24, Joel Hutton via Gcc-patches wrote:
>> ok for backport to 11?
>> 
>> From: Richard Sandiford 
>> Sent: 10 December 2021 10:22
>> To: Joel Hutton 
>> Cc: GCC Patches ; Richard Biener 
>> Subject: Re: pr103523: Check for PLUS/MINUS support
>>
>> Joel Hutton  writes:
>>> Hi all,
>>>
>>> This is to address pr103523.
>>>
>>> bootstrapped and regression tested on aarch64.
>>>
>>> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
>>> PR103523 is an ICE on valid code:
>>>
>>> void d(float *a, float b, int c) {
>>>  float e;
>>>  for (; c; c--, e += b)
>>>a[c] = e;
>>> }
>>>
>>> This is due to not checking for PLUS_EXPR support, which is missing in
>>> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
>>> for support in vectorizable_induction.
>>>
>>> gcc/ChangeLog:
>>>
>>>  PR tree-optimization/PR103523
>> The bugzilla hook expects: PR tree-optimization/103523
>>
>>>  * tree-vect-loop.c (vectorizable_induction): Check for
>>>  PLUS_EXPR/MINUS_EXPR support.
>> OK, thanks.
>>
>> Richard
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955

From d6ac8751dfdf3520f90b2ad8d09ebc452cc42831 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 13 Dec 2021 14:49:55 +
Subject: [PATCH] vect-loop: fix build

Previous commit broke build as it relied on directly_supported_p which
is not in 11. This reworks to avoid using directly_supported_p.

gcc/ChangeLog:

	* tree-vect-loop.c (vectorizable_induction): Rework to avoid
directly_supported_p
---
 gcc/tree-vect-loop.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 54320681119d880ec2cbe641fe16ee73e552f57d..8f464e61780dbb2df5836022d77307c5c90a4b6f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7997,8 +7997,14 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
 
   /* Check for backend support of PLUS/MINUS_EXPR. */
-  if (!directly_supported_p (PLUS_EXPR, step_vectype)
-  || !directly_supported_p (MINUS_EXPR, step_vectype))
+  direct_optab ot_plus = optab_for_tree_code (tree_code (PLUS_EXPR),
+		 step_vectype, optab_default);
+  direct_optab ot_minus = optab_for_tree_code (tree_code (MINUS_EXPR),
+		 step_vectype, optab_default);
+  if (ot_plus == unknown_optab
+  || ot_minus == unknown_optab
+  || optab_handler (ot_minus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing
+  || optab_handler (ot_plus, TYPE_MODE (step_vectype)) == CODE_FOR_nothing);
 return false;
 
   if (!vec_stmt) /* transformation not required.  */
-- 
2.17.1



Re: pr103523: Check for PLUS/MINUS support

2021-12-10 Thread Joel Hutton via Gcc-patches
ok for backport to 11?

From: Richard Sandiford 
Sent: 10 December 2021 10:22
To: Joel Hutton 
Cc: GCC Patches ; Richard Biener 
Subject: Re: pr103523: Check for PLUS/MINUS support

Joel Hutton  writes:
> Hi all,
>
> This is to address pr103523.
>
> bootstrapped and regression tested on aarch64.
>
> Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
> PR103523 is an ICE on valid code:
>
> void d(float *a, float b, int c) {
> float e;
> for (; c; c--, e += b)
>   a[c] = e;
> }
>
> This is due to not checking for PLUS_EXPR support, which is missing in
> VNx2sf mode. This causes an ICE at expand time. This patch adds a check
> for support in vectorizable_induction.
>
> gcc/ChangeLog:
>
> PR tree-optimization/PR103523

The bugzilla hook expects: PR tree-optimization/103523

> * tree-vect-loop.c (vectorizable_induction): Check for
> PLUS_EXPR/MINUS_EXPR support.

OK, thanks.

Richard


pr103523: Check for PLUS/MINUS support

2021-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This is to address pr103523.

bootstrapped and regression tested on aarch64.

Check for PLUS_EXPR/MINUS_EXPR support in vectorizable_induction.
PR103523 is an ICE on valid code:

void d(float *a, float b, int c) {
    float e;
    for (; c; c--, e += b)
      a[c] = e;
}

This is due to not checking for PLUS_EXPR support, which is missing in
VNx2sf mode. This causes an ICE at expand time. This patch adds a check
for support in vectorizable_induction.

gcc/ChangeLog:
        
PR tree-optimization/PR103523
* tree-vect-loop.c (vectorizable_induction): Check for
    PLUS_EXPR/MINUS_EXPR support.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103523.c b/gcc/testsuite/gcc.target/aarch64/pr103523.c
new file mode 100644
index ..736e8936c5f6768bdf098ddc37b2c21ab74ee0df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr103523.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -mtune=neoverse-v1 -Ofast" } */
+
+void d(float *a, float b, int c) {
+float e;
+for (; c; c--, e += b)
+  a[c] = e;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 7f544ba1fd5198dd32cda05e62382ab2e1e9bb50..f700d5e7ac2c05402407a46113320f79359906fa 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8065,6 +8065,15 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return false;
 }
 
+  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+  gcc_assert (step_expr != NULL_TREE);
+  tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
+
+  /* Check for backend support of PLUS/MINUS_EXPR. */
+  if (!directly_supported_p (PLUS_EXPR, step_vectype)
+  || !directly_supported_p (MINUS_EXPR, step_vectype))
+return false;
+
   if (!vec_stmt) /* transformation not required.  */
 {
   unsigned inside_cost = 0, prologue_cost = 0;
@@ -8124,10 +8133,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform induction phi.\n");
 
-  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
-  gcc_assert (step_expr != NULL_TREE);
-  tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
-
   pe = loop_preheader_edge (iv_loop);
   /* Find the first insertion point in the BB.  */
   basic_block bb = gimple_bb (phi);


[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-25 Thread Joel Hutton via Gcc-patches
Just a quick ping to check this hasn't been forgotten.

> -Original Message-
> From: Joel Hutton
> Sent: 12 November 2021 11:42
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> > please use #define INCLUDE_MAP before the system.h include instead.
> > Is it really necessary to build a new std::map for each optab lookup?!
> > That looks quite ugly and inefficient.  We'd usually - if necessary at
> > all - build a auto_vec > and .sort () and .bsearch () 
> > it.
> Ok, I'll rework this part. In the meantime, to address your other comment.
> 
> > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> > cover letter nor the patch ChangeLog explains anything.
> 
> I'll attempt to clarify, if this makes things clearer I can include this in 
> the
> commit message of the respun patch:
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it
> provides convenience wrappers for defining conversions that require a hi/lo
> split, like widening and narrowing operations.  Each definition for 
> will require an optab named  and two other optabs that you specify
> for signed and unsigned. The hi/lo pair is necessary because the widening
> operations take n narrow elements as inputs and return n/2 wide elements
> as outputs. The 'lo' operation operates on the first n/2 elements of input.
> The 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to 
> be
> returned from a vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a
> widening internal_fn. It is defined differently in different places and 
> internal-
> fn.def is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI,
> IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_
> -> (u/s)addl2
>IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_
> -> (u/s)addl
> 
> This gives the same functionality as the previous
> WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
> VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> Let me know if I'm not expressing this clearly.
> 
> Thanks,
> Joel


RE: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-16 Thread Joel Hutton via Gcc-patches
Updated patch 2 with explanation included in commit message and changes 
requested.

Bootstrapped and regression tested on aarch64
> -Original Message-
> From: Joel Hutton
> Sent: 12 November 2021 11:42
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> > please use #define INCLUDE_MAP before the system.h include instead.
Done.

> > Is it really necessary to build a new std::map for each optab lookup?!
> > That looks quite ugly and inefficient.  We'd usually - if necessary at
> > all - build a auto_vec > and .sort () and .bsearch () 
> > it.
> Ok, I'll rework this part. In the meantime, to address your other comment.
Done.

> > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> > cover letter nor the patch ChangeLog explains anything.
> 
> I'll attempt to clarify, if this makes things clearer I can include this in 
> the
> commit message of the respun patch:
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it
> provides convenience wrappers for defining conversions that require a hi/lo
> split, like widening and narrowing operations.  Each definition for 
> will require an optab named  and two other optabs that you specify
> for signed and unsigned. The hi/lo pair is necessary because the widening
> operations take n narrow elements as inputs and return n/2 wide elements
> as outputs. The 'lo' operation operates on the first n/2 elements of input.
> The 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to 
> be
> returned from a vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a
> widening internal_fn. It is defined differently in different places and 
> internal-
> fn.def is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI,
> IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_
> -> (u/s)addl2
>IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_
> -> (u/s)addl
> 
> This gives the same functionality as the previous
> WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
> VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> Let me know if I'm not expressing this clearly.
> 
> Thanks,
> Joel


0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch
Description: 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch


0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


RE: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-12 Thread Joel Hutton via Gcc-patches
> please use #define INCLUDE_MAP before the system.h include instead.
> Is it really necessary to build a new std::map for each optab lookup?!
> That looks quite ugly and inefficient.  We'd usually - if necessary at all - 
> build
> a auto_vec > and .sort () and .bsearch () it.
Ok, I'll rework this part. In the meantime, to address your other comment.

> I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> cover letter nor the patch ChangeLog explains anything.

I'll attempt to clarify, if this makes things clearer I can include this in the 
commit message of the respun patch:

DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it provides 
convenience wrappers for defining conversions that require a hi/lo split, like 
widening and narrowing operations.  Each definition
for  will require an optab named  and two other optabs that you 
specify for signed and unsigned. The hi/lo pair is necessary because the 
widening operations take n narrow elements as inputs and return n/2 wide 
elements as outputs. The 'lo' operation operates on the first n/2 elements of 
input. The 'hi' operation operates on the second n/2 elements of input. 
Defining an internal_fn along with hi/lo variations allows a single internal 
function to be returned from a vect_recog function that will later be expanded 
to hi/lo.

DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a widening 
internal_fn. It is defined differently in different places and internal-fn.def 
is sourced from those places so the parameters given can be reused.
  internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later 
defined to generate the  'expand_' functions for the hi/lo versions of the fn.
  internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original and 
hi/lo variants of the internal_fn

 For example:
 IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_ -> 
(u/s)addl2
   IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_ 
-> (u/s)addl

This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree 
codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

Let me know if I'm not expressing this clearly.

Thanks,
Joel


[vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-11 Thread Joel Hutton via Gcc-patches
Hi all,

This refactor allows widening vect patterns (such as widen_plus/widen_minus) to 
be represented as
either internal_fns or tree_codes and replaces the current 
widen_plus/widen_minus with internal_fn versions. This refactor is split into 3 
patches.

Boostrapped and regression tested on aarch64.

Ok for stage 3?




0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch
Description: 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch


0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch
Description: 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch


0003-Remove-widen_plus-minus_expr-tree-codes.patch
Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch


[ping][vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-08-17 Thread Joel Hutton via Gcc-patches
Ping. Is there still interest in refactoring vect-patterns to internal_fn's? 

> -Original Message-
> From: Joel Hutton
> Sent: 07 June 2021 14:30
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener ; Richard Sandiford
> 
> Subject: [vect-patterns][RFC] Refactor widening patterns to allow
> internal_fn's
> 
> Hi all,
> 
> This refactor allows widening patterns (such as widen_plus/widen_minus) to
> be represented as either internal_fns or tree_codes. The widening patterns
> were originally added as tree codes with the expectation that they would be
> refactored later.
> 
> [vect-patterns] Refactor as internal_fn's
> 
> Refactor vect-patterns to allow patterns to be internal_fns starting with
> widening_plus/minus patterns.
> 
> 
> gcc/ChangeLog:
> 
> * gimple-match.h (class code_helper): Move code_helper class to more
> visible header.
> * internal-fn.h (internal_fn_name): Add internal_fn range check.
> * optabs-tree.h (supportable_convert_operation): Change function
> prototypes to use code_helper.
> * tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to use
> code_helper.
> * tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use
> code_helper, build internal_fns.
> (vect_create_vectorized_promotion_stmts): Refactor to use
> code_helper.
> (vectorizable_conversion): Refactor to use code_helper.
> (supportable_widening_operation): Refactor to use code_helper.
> (supportable_narrowing_operation): Refactor to use code_helper.
> * tree-vectorizer.h (supportable_widening_operation): Refactor to use
> code_helper.
> (supportable_narrowing_operation): Refactor to use code_helper.
> * tree.h (class code_helper): Refactor to use code_helper.


[vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-06-07 Thread Joel Hutton via Gcc-patches
Hi all,

This refactor allows widening patterns (such as widen_plus/widen_minus) to be 
represented as
either internal_fns or tree_codes. The widening patterns were originally added 
as tree codes with the expectation that they would be refactored later.

[vect-patterns] Refactor as internal_fn's

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns.


gcc/ChangeLog:

* gimple-match.h (class code_helper): Move code_helper class to more 
visible header.
* internal-fn.h (internal_fn_name): Add internal_fn range check.
* optabs-tree.h (supportable_convert_operation): Change function 
prototypes to use code_helper.
* tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to use 
code_helper.
* tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use 
code_helper, build internal_fns.
(vect_create_vectorized_promotion_stmts): Refactor to use code_helper.
(vectorizable_conversion): Refactor to use code_helper.
(supportable_widening_operation): Refactor to use code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree-vectorizer.h (supportable_widening_operation): Refactor to use 
code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree.h (class code_helper): Refactor to use code_helper.


rb14487.patch
Description: rb14487.patch


[vect] Support min/max + index pattern

2021-05-05 Thread Joel Hutton via Gcc-patches
Hi all,

looking for some feedback on this, one thing I would like to draw attention to 
is the fact that this pattern requires 2 separate dependent reductions in the 
epilogue.
The accumulator vector for the maximum/minimum elements can be reduced to a 
scalar result trivially with a min/max, but getting the index from accumulator 
vector of indices is more complex and requires using the position of the 
maximum/minimum scalar result value within the accumulator vector to create a 
mask.

The given solution works but it's slightly messy. 
vect_create_epilogue_for_reduction creates the epilogue for one vectorized 
scalar stmt at a time. This modification makes one
invocation create the epilogue for both related stmts and marks the other as 
'done'. Alternate suggestions are welcome.

Joel

[vect] Support min/max + index pattern

Add the capability to vect-loop to support the following pattern.

for (int i = 0; i < n; i ++)
{
if (data[i] < best)
{
best = data[i];
best_i = i;
}
}

gcc/ChangeLog:

* tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 

 
function.   

 
(vect_recog_minmax_index_pattern): New function.

 
(vect_is_simple_reduction): Add multi_use_reduction case.   

 
(vect_create_epilog_for_reduction): Add minmax+index epilogue handling.


minmax.patch
Description: minmax.patch


RE: [Vect] Fix mask check on Scatter loads/stores

2021-03-10 Thread Joel Hutton via Gcc-patches
>> gcc/testsuite/ChangeLog:
>>
>>   PR target/99102
>>   * gcc.target/aarch64/sve/pr99102.c: New test.
>
>(The filename is out of date, but the git hook would catch that.)
Fixed and committed.

>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c 
>> b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> new file mode 100644
>> index 
>> ..9d321b97a68c05ad08646e8e2d69
>> 79c2030c65e6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-options "-O2 -ftree-vectorize" } */
>> +/* { dg-additional-options "-msve-vector-bits=256" { target 
>> +aarch64_sve } } */
>
>I should have realised this earlier, sorry, but I think we want this to be
>conditional on aarch64_sve256_hw instead.  Same for the scan-tree-dump below.
>
>When running the testsuite on SVE hardware, the above would force a VL of 256
>even if the actual VL is different, so the test would fail at runtime for
>128-bit SVE, 512-bit SVE, etc.
>
>Thanks,
>Richard.
>

Patch attached to add this, Ok?

[testsuite] Fix target selector for pr99102.c

The target selector should explicitly choose 256 bit hardware as
explicit 256 bit compiler options are used to trigger the bug.

gcc/testsuite/ChangeLog:

>---* gcc.dg/vect/pr99102.c: Fix target selector.


0002-testsuite-Fix-target-selector-for-pr99102.c.patch
Description: 0002-testsuite-Fix-target-selector-for-pr99102.c.patch


[Vect] Fix mask check on Scatter loads/stores

2021-03-10 Thread Joel Hutton via Gcc-patches
Hi all,

This patch fixes PR99102. For masked gather loads/scatter stores the
'loop_masks' variable was checked to be non-null, but the 'final_mask' was the
actual mask used.

bootstrapped and regression tested on aarch64. Regression tested on aarch64_sve
under qemu.

 [Vect] Fix mask check on Scatter loads/stores

Previously, IFN_MASK_SCATTER_STORE was used if 'loop_masks' was
non-null, but the mask used is 'final_mask'. This caused a bug where
a 'MASK_STORE' was vectorized into a 'SCATTER_STORE' instead of a
'MASK_SCATTER_STORE'. This fixes PR target/99102.

gcc/ChangeLog:

PR target/99102
* tree-vect-stmts.c (vectorizable_store): Fix scatter store mask
check condition.
(vectorizable_load): Fix gather load mask check condition.

gcc/testsuite/ChangeLog:

PR target/99102
* gcc.target/aarch64/sve/pr99102.c: New test.


diff
Description: diff


Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-11 Thread Joel Hutton via Gcc-patches
Hi Richard,

I've revised the patch, sorry about sloppy formatting in the previous one.

Full bootstrap/regression tests are still running, but the changes are pretty 
trivial.

Ok for trunk assuming tests finish clean?

>Joel Hutton  writes:
>> @@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree 
>> type,
>>  }
>>  }
>>
>> +/* Function supportable_half_widening_operation
>> +
>
>I realise existing (related) functions do have a “Function foo” line,
>but it's not generally the GCC style, so I think it would be better
>to leave it out.
Done.

>> +   Check whether an operation represented by code is a 'half' widening 
>> operation
>
>Nit: CODE should be in caps since it's a parameter name.
Done.

>> +   using only half of the full vector width for the inputs. This allows the
>
>I think this could be misread as saying that we only look at half the
>input vector.  Maybe it would be clearer to say something like “in which
>the input vectors contain half as many bits as the output vectors”
>instead of “using…”.
Done.

>> +   output to be stored in a single vector, as opposed to using the full 
>> width
>> +   for input, where half the elements must be processed at a time into
>> +   respective full-width vector outputs i.e. a 'hi'/'lo' pair using codes 
>> such
>> +   as VEC_WIDEN_MINUS_HI/LO. This is handled by widening the inputs and 
>> using
>> +   non-widening instructions. RTL fusing converts these to the widening 
>> hardware
>> +   instructions if supported.
>
>And here the contrast is with cases in which the input vectors have
>the same number of bits as the output vectors, meaning that we need
>a lo/hi pair to generate twice the amount of output data.
Done.

> Might be clearer to make “This is handled…” a new paragraph.
Done.

>> +
>> +   Supported widening operations:
>> +WIDEN_MINUS_EXPR
>> +WIDEN_PLUS_EXPR
>> +WIDEN_MULT_EXPR
>> +WIDEN_LSHIFT_EXPR
>> +
>> +   Output:
>> +   - CODE1 - The non-widened code, which will be used after the inputs are
>> + converted to the wide type.
>> +   */
>
>Formatting nit, */ should be on the previous line, i.e. “.” + two spaces
>+ “*/”.  (Sorry for all the formatting comments.)
Done (sorry for the formatting errors)

>> +bool
>> +supportable_half_widening_operation (enum tree_code code,
>> +tree vectype_out, tree vectype_in,
>> +enum tree_code *code1)
>> +{
>> +  machine_mode m1,m2;
>> +  bool truncp;
>> +  enum tree_code dummy_code;
>> +  optab op;
>> +
>> +  gcc_assert (VECTOR_TYPE_P (vectype_out) && VECTOR_TYPE_P (vectype_in));
>> +
>> +  m1 = TYPE_MODE (vectype_out);
>> +  m2 = TYPE_MODE (vectype_in);
>> +
>> +  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
>> +return false;
>> +
>> +  if(!known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
>> +   TYPE_VECTOR_SUBPARTS (vectype_out)))
>> +  return false;
>
>Formatting nit: should be a space between “if” and “(”.  IMO
>maybe_ne is more readable than !known_eq.  The return statement
>should only be indented by 4 columns.
Done.

>> +  if(!(code == WIDEN_MINUS_EXPR
>> +   || code == WIDEN_PLUS_EXPR
>> +   || code == WIDEN_MULT_EXPR
>> +   || code == WIDEN_LSHIFT_EXPR))
>> +return false;
>> +
>> +  switch (code)
>> +  {
>> +case WIDEN_LSHIFT_EXPR:
>> +  *code1 = LSHIFT_EXPR;
>> +  break;
>> +case WIDEN_MINUS_EXPR:
>> +  *code1 = MINUS_EXPR;
>> +  break;
>> +case WIDEN_PLUS_EXPR:
>> +  *code1 = PLUS_EXPR;
>> +  break;
>> +case WIDEN_MULT_EXPR:
>> +  *code1 = MULT_EXPR;
>> +  break;
>> +default:
>> +  gcc_unreachable ();
>
>I think it would be better to return false in the default case and remove
>the “if” above, so that we only have one copy of the list of codes.
Done.

>> +  }
>
>Formatting nit: the { and } lines should be indented two spaces more.
>(The contents of the switch are OK as-is.)
Done.

>> +
>> +  if (!supportable_convert_operation(NOP_EXPR, vectype_out, vectype_in,
>> +  _code))
>
>Nit: should be a space between the function name and ”(”.
Done.

>> +return false;
>> +
>> +  op = optab_for_tree_code (*code1, vectype_out, optab_vector);
>> +  return (optab_handler (op, TYPE_MODE (vectype_out)) != CODE_FOR_nothing);
>> +}
>> +
>>  /* Function supportable_convert_operation
>>
>> Check whether an operation represented by the code CODE is a
>
>> […]
>> @@ -4697,7 +4755,13 @@ vectorizable_conversion (vec_info *vinfo,
>>nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
>>nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>if (known_eq (nunits_out, nunits_in))
>> -modifier = NONE;
>> +if (code == WIDEN_MINUS_EXPR
>> + || code == WIDEN_PLUS_EXPR
>> + || code == WIDEN_LSHIFT_EXPR
>> + || code == WIDEN_MULT_EXPR)
>> + modifier = WIDEN;
>
>The “modifier = WIDEN;” line is indented by a tab (i.e. 8 columns),
>but it should only be indented by 6 spaces.
Done.

>> + 

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-10 Thread Joel Hutton via Gcc-patches
Thanks for the quick review.

Updated patch attached. I've addressed your comments below.

Tests are still running, OK for trunk assuming tests come out clean?

[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

In the case where 8 out of every 16 elements are widened using a
widening pattern and the next 8 are skipped the patterns are not
recognized. This is because they are normally used in a pair, such  as
VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example.
This patch adds support for V8QI->V8HI patterns.

gcc/ChangeLog:

PR tree-optimisation/98772
* optabs-tree.c (supportable_half_widening_operation): New function
to check for supportable V8QI->V8HI widening patterns.
* optabs-tree.h (supportable_half_widening_operation): New function.
* tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New
function to create promotion stmts for V8QI->V8HI widening patterns.
(vectorizable_conversion): Add case for V8QI->V8HI.

gcc/testsuite/ChangeLog:

PR tree-optimisation/98772
* gcc.target/aarch64/pr98772.c: New test.


>> +  /* The case where a widening operation is not making use of the full 
>> width of
>> + of the input vector, but using the full width of the output vector.
>> + Return the non-wided code, which will be used after the inputs are
>
>non-widened
Done.

>> + converted to the wide type.  */
>> +  if ((code == WIDEN_MINUS_EXPR
>> +  || code == WIDEN_PLUS_EXPR
>> +  || code == WIDEN_MULT_EXPR
>> +  || code == WIDEN_LSHIFT_EXPR)
>
>Minor formatting nit, but the ||s should be indented one space more.
Done.

>> +  && known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
>> +   TYPE_VECTOR_SUBPARTS (vectype_out)))
>> +  {
>> +switch (code)
>> +{
>> +  case WIDEN_LSHIFT_EXPR:
>> + *code1 = LSHIFT_EXPR;
>> + return true;
>> + break;
>> +  case WIDEN_MINUS_EXPR:
>> + *code1 = MINUS_EXPR;
>> + return true;
>> + break;
>> +  case WIDEN_PLUS_EXPR:
>> + *code1 = PLUS_EXPR;
>> + return true;
>> + break;
>> +  case WIDEN_MULT_EXPR:
>> + *code1 = MULT_EXPR;
>> + return true;
>> + break;
>> +  default:
>> + gcc_unreachable ();
>> +}
>> +  }
>
>Rather than return true, I think we should do something like:
>
>  if (!supportable_convert_operation (NOP_EXPR, vectype_out,
>  vectype_in, _code))
>return false;
>
>  optab = optab_for_tree_code (*code1, vectype_out, optab_default);
>  return (optab_handler (optab, TYPE_MODE (vectype_out))
>  != CODE_FOR_nothing);
>
>to make sure that the target really does support this.
Done. I used 'optab_vector' not 'optab_default', as I thought that was correct 
for this case and otherwise 'optab_for_tree_code' fails an assertion when 
'LSHIFT_EXPR' is used.

> AFAICT the caller always knows when it wants the “if” statement above
> to be used.  What it's doing is a bit different from what
> supportable_convert_operation normally does, so it might be better
> to put it into a separate function that tests whether the target
> supports the non-widening form of a widening operation.
Done.

>> +
>> +  vec_tmp.create (vec_oprnds0->length () * 2);
>
>It looks like the * 2 isn't needed.
Done.

>> +  if (is_gimple_call (new_stmt3))
>> + {
>> +   new_tmp = gimple_call_lhs (new_stmt3);
>> + }
>> +  else
>> + {
>> +   new_tmp = gimple_assign_lhs (new_stmt3);
>> + }
>
>The lhs is always new_tmp3, so it's not necessary to read it back.
Done.

>> +
>> +  /* Store the results for the next step.  */
>> +  vec_tmp.quick_push (new_tmp);
>
>FWIW, you could just assign to vec_oprnds[i] and not have vec_tmp,
>but I don't know whether that's more or less confusing.  Either way's
>fine with me.
I chose to keep vec_tmp, but I don't feel strongly about it.

>> +}
>> +
>> +  vec_oprnds0->release ();
>> +  *vec_oprnds0 = vec_tmp;
>> +}
>> +
>>  
>>  /* Check if STMT_INFO performs a conversion operation that can be 
>> vectorized.
>> If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
>> @@ -4697,7 +4763,13 @@ vectorizable_conversion (vec_info *vinfo,
>>nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
>>nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>if (known_eq (nunits_out, nunits_in))
>> -modifier = NONE;
>> +if (code == WIDEN_MINUS_EXPR
>> + || code == WIDEN_PLUS_EXPR
>> + || code == WIDEN_LSHIFT_EXPR
>> + || code == WIDEN_MULT_EXPR)
>> + modifier = WIDEN;
>
>Formatting nit: the last line should be indented by 6 spaces rather than 8.
Done.

>> @@ -4743,9 +4815,21 @@ vectorizable_conversion (vec_info *vinfo,
>>return false;
>> 
>>  case WIDEN:
>> -  if (supportable_widening_operation (vinfo, code, stmt_info, 
>> vectype_out,
>> -   vectype_in, , ,
>> -   _step_cvt, _types))
>> 

[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-09 Thread Joel Hutton via Gcc-patches
Hi Richards,

This patch adds support for the V8QI->V8HI case from widening vect patterns as 
discussed to target PR98772.

Bootstrapped and regression tested on aarch64.


[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

In the case where 8 out of every 16 elements are widened using a
widening pattern and the next 8 are skipped the patterns are not
recognized. This is because they are normally used in a pair, such  as
VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example.
This patch adds support for V8HI->V8QI patterns.

gcc/ChangeLog:
        PR tree-optimisation/98772
        * optabs-tree.c (supportable_convert_operation): Add case for V8QI->V8HI
        * tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New 
function to generate promotion stmts for V8QI->V8HI
        (vectorizable_conversion): Add case for V8QI->V8HI

gcc/testsuite/ChangeLog:
        PR tree-optimisation/98772
        * gcc.target/aarch64/pr98772.c: New test.diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed98f8c4cab65891f65dedebdb1ec274..b91ce3af6f0d4b3a62110bdb38f68ecc53765cad 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -308,6 +308,40 @@ supportable_convert_operation (enum tree_code code,
   if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
 return false;
 
+  /* The case where a widening operation is not making use of the full width of
+ of the input vector, but using the full width of the output vector.
+ Return the non-wided code, which will be used after the inputs are
+ converted to the wide type.  */
+  if ((code == WIDEN_MINUS_EXPR
+  || code == WIDEN_PLUS_EXPR
+  || code == WIDEN_MULT_EXPR
+  || code == WIDEN_LSHIFT_EXPR)
+  && known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
+		  TYPE_VECTOR_SUBPARTS (vectype_out)))
+  {
+switch (code)
+{
+  case WIDEN_LSHIFT_EXPR:
+	*code1 = LSHIFT_EXPR;
+	return true;
+	break;
+  case WIDEN_MINUS_EXPR:
+	*code1 = MINUS_EXPR;
+	return true;
+	break;
+  case WIDEN_PLUS_EXPR:
+	*code1 = PLUS_EXPR;
+	return true;
+	break;
+  case WIDEN_MULT_EXPR:
+	*code1 = MULT_EXPR;
+	return true;
+	break;
+  default:
+	gcc_unreachable ();
+}
+  }
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), )
diff --git a/gcc/testsuite/gcc.target/aarch64/pr98772.c b/gcc/testsuite/gcc.target/aarch64/pr98772.c
new file mode 100644
index ..35568a9f9d60c44aa01a6afc5f7e6a0935009aaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr98772.c
@@ -0,0 +1,155 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#define DSIZE 16
+#define PIXSIZE 64
+
+extern void
+wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] + pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wplus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] + pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wminus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] - pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wminus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] - pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wmult (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] * pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wmult_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] * pix2[x];
+	pix1 += 16;
+	pix2 += 16;
+}
+}
+
+extern void
+wlshift (uint16_t *d, uint8_t *restrict pix1)
+
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] << 8;
+	pix1 += 16;
+}
+}
+extern void __attribute__((optimize (0)))
+wlshift_no_opt (uint16_t *d, uint8_t *restrict pix1)
+
+{
+for( int y = 0; y < 4; y++ )
+{
+	for( int x = 0; x < 4; x++ )
+	d[x + y*4] = pix1[x] << 8;
+	pix1 += 16;
+}
+}
+
+void __attribute__((optimize (0)))
+init_arrays(uint16_t *d_a, uint16_t *d_b, uint8_t *pix1, uint8_t *pix2)
+{
+  for(int i = 0; i < DSIZE; i++)
+  {
+d_a[i] = (1074 * i)%17;
+d_b[i] = (1074 * i)%17;
+  }
+  for(int i = 0; i < PIXSIZE; i++)
+  {
+pix1[i] = (1024 * 

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
> Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
> subtract?  

I mean the latter, that seemed to be what richi was suggesting previously. 

> The problem with the latter is that we need to fill the
> extra unused elements with something and remove them later.

That's fair enough, fake/don't care elements is a bit of a hack. I'll try 
something out with the conversions and regular subtract.

thanks for the help,
Joel


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>> So emit a v4qi->v8qi gimple conversion
>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>
>I was thinking of using a v8qi->v8hi convert on each operand followed
>by a normal v8hi subtraction.  That's what we'd generate if the target
>didn't define the widening patterns.

Is there a reason that conversion is preferred? If we use a widening subtract
then we don't need to rely on RTL fusing later.


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>>> In practice this will only affect targets that choose to use mixed
>>> vector sizes, and I think it's reasonable to optimise only for the
>>> case in which such targets support widening conversions.  So what
>>> do you think about the idea of emitting separate conversions and
>>> a normal subtract?  We'd be relying on RTL to fuse them together,
>>> but at least there would be no redundancy to eliminate.
>>
>> So in vectorizable_conversion for the widen-minus you'd check
>> whether you can do a v4qi -> v4hi and then emit a conversion
>> and a wide minus?
>
>Yeah.

This seems reasonable, as I recall we decided against adding
internal functions for the time being as all the existing vec patterns
code would have to be refactored. So emit a v4qi->v8qi gimple conversion
then a regular widen_lo/hi using the existing backend patterns/optabs?


[RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-01 Thread Joel Hutton via Gcc-patches
Hi Richard(s),

I'm just looking to see if I'm going about this the right way, based on the 
discussion we had on IRC. I've managed to hack something together, I've 
attached a (very) WIP patch which gives the correct codegen for the testcase in 
question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
obviously need to support other widening patterns and differentiate between 
big/little endian among other things.

I added a backend pattern because I wasn't quite clear which changes to make in 
order to allow the existing backend patterns to be used with a V8QI, or how to 
represent V16QI where we don't care about the top/bottom 8. I made some attempt 
in optabs.c, which is in the patch commented out, but I'm not sure if I'm going 
about this the right way.

Joel
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index be2a5a865172bdd7848be4082abb0fdfb0b35937..c66b8a367623c8daf4423677d292e292feee3606 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3498,6 +3498,14 @@
   DONE;
 })
 
+(define_insn "vec_widen_usubl_half_v8qi"
+  [(match_operand:V8HI 0 "register_operand")
+(match_operand:V8QI 1 "register_operand")
+(match_operand:V8QI 2 "register_operand")]
+  "TARGET_SIMD"
+  "usubl\t%0., %1., %2."
+)
+
 (define_expand "vec_widen_subl_hi_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
diff --git a/gcc/expr.c b/gcc/expr.c
index 04ef5ad114d0662948c896cdbf58e67737b39c7e..0939a156deef63f1cf2fa7e29c2c94925820f2ba 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9785,6 +9785,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
 case VEC_WIDEN_PLUS_HI_EXPR:
 case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HALF_EXPR:
 case VEC_WIDEN_MINUS_HI_EXPR:
 case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 876a3a6f348de122e5a52e6dd70d7946bc810162..10aa21d07595325fd8ef3057444853fc946385de 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -186,6 +186,9 @@ bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
 enum insn_code find_widening_optab_handler_and_mode (optab, machine_mode,
 		 machine_mode,
 		 machine_mode *);
+enum insn_code find_half_mode_optab_and_mode (optab, machine_mode,
+		 machine_mode,
+		 machine_mode *);
 int can_mult_highpart_p (machine_mode, bool);
 bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 opt_machine_mode get_len_load_store_mode (machine_mode, bool);
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 3248ce2c06e65c9c0366757907ab057407f7c594..7abfc04aa18b7ee5b734a1b1f4378b4615ee31fd 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -462,6 +462,17 @@ can_vec_perm_const_p (machine_mode mode, const vec_perm_indices ,
   return false;
 }
 
+enum insn_code
+find_half_mode_optab_and_mode (optab op, machine_mode to_mode,
+  machine_mode from_mode,
+  machine_mode *found_mode)
+{
+insn_code icode = CODE_FOR_nothing;
+if (GET_MODE_2XWIDER_MODE(from_mode).exists(found_mode))
+  icode = optab_handler (op, *found_mode);
+return icode;
+}
+
 /* Find a widening optab even if it doesn't widen as much as we want.
E.g. if from_mode is HImode, and to_mode is DImode, and there is no
direct HI->SI insn, then return SI->DI, if that exists.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed98f8c4cab65891f65dedebdb1ec274..eb52dc15f8094594c4aa22d5fc1c442886e4ebf6 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -185,6 +185,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 case VEC_WIDEN_MINUS_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
+
+case VEC_WIDEN_MINUS_HALF_EXPR:
+  return vec_widen_usubl_half_optab;
 
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
@@ -308,6 +311,16 @@ supportable_convert_operation (enum tree_code code,
   if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
 return false;
 
+  /* The case where vectype_in is half the vector width, as opposed to the
+ normal case for widening patterns of vector width input, with output in
+ multiple registers. */
+  if (code == WIDEN_MINUS_EXPR &&
+  known_eq(TYPE_VECTOR_SUBPARTS(vectype_in),TYPE_VECTOR_SUBPARTS(vectype_out)) )
+  {
+*code1 = VEC_WIDEN_MINUS_HALF_EXPR;
+return true;
+  }
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), )
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..1252097be9893d7d65ea844fc0eda9bad70b9256 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -293,6 +293,13 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 icode = 

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Joel Hutton via Gcc-patches
> I think we should try to fix the PR instead.  The widening operations
> can only be used for SLP if the group_size is at least double the
> number of elements in the vectype, so one idea (not worked through)
> is to make the vect_build_slp_tree family of routines undo pattern
> recognition for widening operations if the group size is less than that. >

This seems reasonable, how do I go about 'undoing' the pattern recognition.

Ideally the patterns wouldn't be substituted in the first place, but group size 
is chosen after pattern substitution.

From: Richard Sandiford 
Sent: 21 January 2021 13:40
To: Richard Biener 
Cc: Joel Hutton via Gcc-patches ; Joel Hutton 

Subject: Re: [AArch64] Remove backend support for widen-sub

Richard Biener  writes:
> On Thu, 21 Jan 2021, Richard Sandiford wrote:
>
>> Joel Hutton via Gcc-patches  writes:
>> > Hi all,
>> >
>> > This patch removes support for the widening subtract operation in the 
>> > aarch64 backend as it is causing a performance regression.
>> >
>> > In the following example:
>> >
>> > #include 
>> > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t 
>> > *restrict pix2)
>> > {
>> >for( int y = 0; y < 4; y++ )
>> >   {
>> > for( int x = 0; x < 4; x++ )
>> >   d[x + y*4] = pix1[x] - pix2[x];
>> > pix1 += 16;
>> > pix2 += 16;
>> >  }
>> >
>> > The widening minus pattern is recognized and substituted, but cannot be 
>> > used due to the input vector type chosen in slp vectorization. This 
>> > results in an attempt to do an 8 byte->8 short widening subtract 
>> > operation, which is not supported.
>> >
>> > The issue is documented in PR 98772.
>>
>> IMO removing just the sub patterns is too arbitrary.  Like you say,
>> the PR affects all widening instructions. but it happens that the
>> first encountered regression used subtraction.
>>
>> I think we should try to fix the PR instead.  The widening operations
>> can only be used for SLP if the group_size is at least double the
>> number of elements in the vectype, so one idea (not worked through)
>> is to make the vect_build_slp_tree family of routines undo pattern
>> recognition for widening operations if the group size is less than that.
>>
>> Richi would know better than me though.
>
> Why should the widen ops be only usable with such constraints?
> As I read md.texi they for example do v8qi -> v8hi operations
> (where the v8qi is either lo or hi part of a v16qi vector).

They're v16qi->v8hi operations rather than v8qi->v8hi.  The lo
operations operate on one half of the v16qi and the hi operations
operate on the other half.  They're supposed to be used together
to produce v16qi->v8hi+v8hi, so for BB SLP we need a group size
of 16 to feed them.  (Loop-aware SLP is fine as-is because we can
just increase the unroll factor.)

In the testcase, the store end of the SLP graph is operating on
8 shorts, which further up the graph are converted from 8 chars.
To use WIDEN_MINUS_EXPR at v8hi we'd need 16 shorts and 16 chars.

> The dumps show we use a VF of 4 which makes us have two v8hi
> vectors and one v16qi which vectorizable_conversion should
> be able to handle by emitting hi/lo widened subtracts.

AIUI the problem is with slp1.  I think the loop side is fine.

Thanks,
Richard


[AArch64] Remove backend support for widen-sub

2021-01-21 Thread Joel Hutton via Gcc-patches
Hi all, 

This patch removes support for the widening subtract operation in the aarch64 
backend as it is causing a performance regression.

In the following example:

#include 
extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *restrict 
pix2)
{
   for( int y = 0; y < 4; y++ )
  {    
    for( int x = 0; x < 4; x++ )
      d[x + y*4] = pix1[x] - pix2[x];
    pix1 += 16;  
    pix2 += 16;
 }

The widening minus pattern is recognized and substituted, but cannot be used 
due to the input vector type chosen in slp vectorization. This results in an 
attempt to do an 8 byte->8 short widening subtract operation, which is not 
supported. 

The issue is documented in PR 98772.


[AArch64] Remove backend support for widen-sub

This patch removes support for the widening subtract operation in the aarch64 
backend as it is causing a performance regression.

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md    
        (vec_widen_subl_lo_): Removed.
        (vec_widen_subl_hi_): Removed.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vect-widen-sub.c: Removed.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 41071b668fd0982f55f9e48510403b9f50fe0f60..c685c512e06917f9cf6bdcffcc41dd091dabfb4e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3478,30 +3478,6 @@
   DONE;
 })
 
-(define_expand "vec_widen_subl_lo_"
-  [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
-   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
-  "TARGET_SIMD"
-{
-  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-  emit_insn (gen_aarch64_subl_lo_internal (operands[0], operands[1],
-		 operands[2], p));
-  DONE;
-})
-
-(define_expand "vec_widen_subl_hi_"
-  [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
-   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
-  "TARGET_SIMD"
-{
-  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-  emit_insn (gen_aarch64_subl_hi_internal (operands[0], operands[1],
-		 operands[2], p));
-  DONE;
-})
-
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
(match_operand:VQW 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
deleted file mode 100644
index a2bed63affbd091977df95a126da1f5b8c1d41d2..
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
-#include 
-#include 
-
-#pragma GCC target "+nosve"
-
-#define ARR_SIZE 1024
-
-/* Should produce an usubl */
-void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-__attribute__((optimize (0)))
-void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-/* Should produce an ssubl */
-void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-__attribute__((optimize (0)))
-void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
-{
-for( int i = 0; i < ARR_SIZE - 3;i=i+4)
-{
-foo[i]   = a[i]   - b[i];
-foo[i+1] = a[i+1] - b[i+1];
-foo[i+2] = a[i+2] - b[i+2];
-foo[i+3] = a[i+3] - b[i+3];
-}
-}
-
-
-void __attribute__((optimize (0)))
-init(uint16_t *a, uint16_t *b)
-{
-for( int i = 0; i < ARR_SIZE;i++)
-{
-  a[i] = i;
-  b[i] = 2*i;
-}
-}
-
-int __attribute__((optimize (0)))
-main()
-{
-uint32_t foo_arr[ARR_SIZE];
-uint32_t bar_arr[ARR_SIZE];
-uint16_t a[ARR_SIZE];
-uint16_t b[ARR_SIZE];
-
-init(a, b);
-usub_opt(foo_arr, a, b);
-usub_nonopt(bar_arr, a, b);
-if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
-  return 1;
-ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
-ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
-if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
-  return 1;
-return 0;
-}
-
-/* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
-/* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
-/* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
-/* { dg-final { scan-assembler-times {\tssubl2\t} 1} } */


[2/2][VECT] pr97929 fix

2020-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This patch addresses PR97929 by adding a missing case for WIDEN_PLUS/MINUS in 
vect_get_smallest_scalar_type. It also introduces a test to check for 
regression. 

One thing to note is that I saw a failure on c-c++-common/builtins.c which 
disappeared when I ran the test again. I assume this is due to the test being 
unreliable.

Bootstrapped and regression tested all together.

Ok for trunk?

[VECT] pr97929 fix

This addresses pr97929. The case for WIDEN_PLUS and WIDEN_MINUS were
missing in vect_get_smallest_scalar_type.

gcc/ChangeLog:

PR tree-optimization/97929
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Add
WIDEN_PLUS/WIDEN_MINUS case.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97929.c: New test.From 1d7855aab9ea099f244e50baede24c50897f6eb5 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 9 Dec 2020 16:51:17 +
Subject: [PATCH 2/2] [VECT] pr97929 fix

This addresses pr97929. The case for WIDEN_PLUS and WIDEN_MINUS were
missing in vect_get_smallest_scalar_type.

gcc/ChangeLog:

PR tree-optimization/97929
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Add
WIDEN_PLUS/WIDEN_MINUS case.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97929.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr97929.c | 10 ++
 gcc/tree-vect-data-refs.c   |  2 ++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97929.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97929.c b/gcc/testsuite/gcc.dg/vect/pr97929.c
new file mode 100644
index 000..a027b317151
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97929.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+#include 
+#define ARR_SIZE 1024
+extern void foo (int32_t *bar, int16_t a)
+{
+for( int i = 0; i < ARR_SIZE;i++)
+{
+bar[i]  = a + 1;
+}
+}
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 18e36c89d14..137017091e8 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -136,6 +136,8 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info,
 	  || gimple_assign_rhs_code (assign) == WIDEN_SUM_EXPR
 	  || gimple_assign_rhs_code (assign) == WIDEN_MULT_EXPR
 	  || gimple_assign_rhs_code (assign) == WIDEN_LSHIFT_EXPR
+	  || gimple_assign_rhs_code (assign) == WIDEN_PLUS_EXPR
+	  || gimple_assign_rhs_code (assign) == WIDEN_MINUS_EXPR
 	  || gimple_assign_rhs_code (assign) == FLOAT_EXPR))
 {
   tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (assign));
-- 
2.17.1



[1/2][TREE] Add WIDEN_PLUS, WIDEN_MINUS pretty print

2020-12-10 Thread Joel Hutton via Gcc-patches
Hi all,

This adds missing pretty print for WIDEN_PLUS/MINUS and 
VEC_WIDEN_PLUS/MINUS_HI/LO

Bootstrapped and regression tested all together on aarch64.

Ok for trunk?

Add WIDEN_PLUS, WIDEN_MINUS pretty print

Add 'w+'/'w-' as WIDEN_PLUS/WIDEN_MINUS respectively.
Add 'VEC_WIDEN_PLUS/MINUS_HI/LO<...>' for   VEC_WIDEN_PLUS/MINUS_HI/LO

gcc/ChangeLog:

 * tree-pretty-print.c (dump_generic_node): Add case for
 VEC_WIDEN_(PLUS/MINUS)_(HI/LO)_EXPR and WIDEN_(PLUS/MINUS)_EXPR.
From 6ec022ddc86e97fd7779bdf075619d2e273c77d0 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 8 Dec 2020 19:33:36 +
Subject: [PATCH 1/2] Add WIDEN_PLUS, WIDEN_MINUS pretty print

Add 'w+'/'w-' as WIDEN_PLUS/WIDEN_MINUS respectively.
Add VEC_WIDEN_PLUS/MINUS_HI/LO<...> for VEC_WIDEN_PLUS/MINUS_HI/LO

gcc/ChangeLog:

* tree-pretty-print.c (dump_generic_node): Add case for
VEC_WIDEN_(PLUS/MINUS)_(HI/LO)_EXPR and WIDEN_(PLUS/MINUS)_EXPR.
(op_symbol_code): Add case for WIDEN_PLUS/MINUS.
---
 gcc/tree-pretty-print.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 5a93c4d8b9e..ae4b898782a 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2649,6 +2649,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
   break;
 
   /* Binary arithmetic and logic expressions.  */
+case WIDEN_PLUS_EXPR:
+case WIDEN_MINUS_EXPR:
 case WIDEN_SUM_EXPR:
 case WIDEN_MULT_EXPR:
 case MULT_EXPR:
@@ -3580,6 +3582,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 case VEC_SERIES_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_PLUS_HI_EXPR:
+case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HI_EXPR:
+case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
 case VEC_WIDEN_MULT_ODD_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
@@ -4097,6 +4103,12 @@ op_symbol_code (enum tree_code code)
 case WIDEN_LSHIFT_EXPR:
   return "w<<";
 
+case WIDEN_PLUS_EXPR:
+  return "w+";
+
+case WIDEN_MINUS_EXPR:
+  return "w-";
+
 case POINTER_PLUS_EXPR:
   return "+";
 
-- 
2.17.1



Re: [3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> > +#include 
> > +
> 
> SVE targets will need a:
> 
> #pragma GCC target "+nosve"
> 
> here, since we'll generate different code for SVE.

Fixed.

> > +/* { dg-final { scan-assembler-times "shll\t" 1} } */
> > +/* { dg-final { scan-assembler-times "shll2\t" 1} } */
> 
> Very minor nit, sorry, but I think:
> 
> /* { dg-final { scan-assembler-times {\tshll\t} 1 } } */
> 
> would be better.  Using "…\t" works, but IIRC it shows up as a tab
> character in the testsuite result summary too.

Fixed. Minor nits welcome. :)


> OK for the aarch64 bits with the testsuite changes above.
ok?

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo
  patterns.
* tree-vect-stmts.c
(vectorizable_conversion): Fix for widen_lshift case.

gcc/testsuite/ChangeLog:

2020-11-13  Joel Hutton  

* gcc.target/aarch64/vect-widen-lshift.c: New test.
From e8d3ed6fa739850eb649b97c250f1f2c650c34c1 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 62 +
 gcc/tree-vect-stmts.c |  5 +-
 3 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 30299610635..4ba799a27c9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4664,8 +4664,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index 000..48a3719d4ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an shll,shll2 pair*/
+void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+__attribute__((optimize (0)))
+void sshll_nonopt (int32_t *foo, int16_t *a, 

Re: [2/3][vect] Add widening add, subtract vect patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed all the comments.

> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.

Done.

> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.

Fixed.

> > gcc/ChangeLog:
> >
> > 2020-11-12  Joel Hutton  
> >
> > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> 
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
> 
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence).  Same for the rest

Fixed.

> > * optabs-tree.c (optab_for_tree_code): optabs for widening 
> > adds,subtracts
> 
> The line limit for changelogs is 80 characters.  The entry should say
> what changed, so “Handle …” or “Add case for …” or something.

Fixed.

> > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog 
> > ptatern
> 
> typo: pattern

Fixed.

> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
> 
> Would be good to expand on this for the final commit message.

'fix sad' was accidentally included when I squashed two commits. I've made all 
the commit messages more descriptive.

> > +
> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +  return (TYPE_UNSIGNED (type)
> > +   ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > +
> > +
> 
> Nits: excess blank line at the end and excess space before the “:”s.

Fixed.

> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> 
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.

Fixed.

> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines.  Same for the sub test.

Fixed.

> I am missing documentation in md.texi for the new patterns.  In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.

Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).

> The new functions should have comments before them.  Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.

Fixed.

> > +case VEC_WIDEN_SUB_HI_EXPR:
> > +case VEC_WIDEN_SUB_LO_EXPR:
> > +case VEC_WIDEN_ADD_HI_EXPR:
> > +case VEC_WIDEN_ADD_LO_EXPR:
> > +  return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.

Fixed.

> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> >   of the above pattern.  */
> >
> >tree plus_oprnd0, plus_oprnd1;
> > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > -_oprnd0, _oprnd1))
> > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > +_oprnd0, _oprnd1)
> > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > +_oprnd0, _oprnd1)))
> >  return NULL;
> >
> > tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
>   /* Any non-truncating sequence of conversions is OK here, since
>  with a successful match, the result of the ABS(U) is known to fit
>  within the nonnegative range of the result type.  (It cannot be the
>  negative of the minimum signed value due to the range of the widening
>  MINUS_EXPR.)  */
>   vect_unpromoted_value unprom_abs;
>   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>   _abs);
>
> specific to the PLUS_EXPR case.  If we look through promotions on
> the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> and one on its inputs.

Fixed.


gcc/ChangeLog:

2020-11-13  Joel Hutton  

* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
  adds, subtracts.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for 

Re: [1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-13 Thread Joel Hutton via Gcc-patches
Tests are still running, but I believe I've addressed the comment.

> There are ways in which we could reduce the amount of cut-&-paste here,
> but I guess everything is a trade-off between clarity and compactness.
> One extreme is to write them all out explicitly, another extreme would
> be to have one define_expand and various iterators and attributes.
>
> I think the vec_widen_mult_*_ patterns strike a good balance:
> the use ANY_EXTEND to hide the sign difference while still having
> separate hi and lo patterns:

Done

gcc/ChangeLog:

2020-11-13  Joel Hutton  

* config/aarch64/aarch64-simd.md: New patterns
  vec_widen_saddl_lo/hi_.
From c52fd11f5d471200c1292fad3bc04056e7721f06 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract patterns to the aarch64
backend. These allow taking vectors of N elements of size S
and performing and add/subtract on the high or low half
widening the resulting elements and storing N/2 elements of size 2*S.
These correspond to the addl,addl2,subl,subl2 instructions.
---
 gcc/config/aarch64/aarch64-simd.md | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a..30299610635 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,53 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_addl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_addl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_addl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_addl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_subl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_subl_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_subl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



[1/3][aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds backend patterns for vec_widen_add, vec_widen_sub on aarch64.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: New patterns 
vec_widen_saddl_lo/hi_
From 3e47bc562b83417a048e780bcde52fb2c9617df3 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:35:57 +
Subject: [PATCH 1/3] [aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract pattrerns to the aarch64
backend.
---
 gcc/config/aarch64/aarch64-simd.md | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2cf6fe9154a2ee1b21ad9e8e2a6109805022be7f..b4f56a2295926f027bd53e7456eec729af0cd6df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3382,6 +3382,100 @@
   [(set_attr "type" "neon__long")]
 )
 
+(define_expand "vec_widen_saddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_saddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_ssubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_saddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_saddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_ssubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_ssubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+(define_expand "vec_widen_uaddl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_uaddl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+  emit_insn (gen_aarch64_usubl_lo_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_uaddl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_uaddl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
+(define_expand "vec_widen_usubl_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")
+   (match_operand:VQW 2 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_usubl_hi_internal (operands[0], operands[1],
+		  operands[2], p));
+  DONE;
+})
+
 
 (define_expand "aarch64_saddl2"
   [(match_operand: 0 "register_operand")
-- 
2.17.1



[3/3][aarch64] Add support for vec_widen_shift pattern

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds support in the aarch64 backend for the vec_widen_shift 
vect-pattern and makes a minor mid-end fix to support it.

All 3 patches together bootstrapped and regression tested on aarch64.

Ok for stage 1?

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * config/aarch64/aarch64-simd.md: vec_widen_lshift_hi/lo patterns
        * tree-vect-stmts.c 
        (vectorizable_conversion): Fix for widen_lshift case

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-lshift.c: New test.
From 97af35b2d2a505dcefd8474cbd4bc3441b83ab02 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 12 Nov 2020 11:48:25 +
Subject: [PATCH 3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end.
---
 gcc/config/aarch64/aarch64-simd.md| 66 +++
 .../gcc.target/aarch64/vect-widen-lshift.c| 60 +
 gcc/tree-vect-stmts.c |  9 ++-
 3 files changed, 133 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b4f56a2295926f027bd53e7456eec729af0cd6df..2bb39c530a1a861cb9bd3df0c2943f62bd6153d7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4711,8 +4711,74 @@
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
+(define_expand "vec_widen_shiftl_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
+		 p, operands[2]));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_shiftl_hi_"
+   [(set (match_operand: 0 "register_operand")
+	(unspec: [(match_operand:VQW 1 "register_operand" "w")
+			 (match_operand:SI 2
+			   "immediate_operand" "i")]
+			  VSHLL))]
+   "TARGET_SIMD"
+   {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
+		  p, operands[2]));
+DONE;
+   }
+)
+
 ;; vshll_n
 
+(define_insn "aarch64_shll_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll\\t%0., %1., %3";
+else
+  return "shll\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_shll2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(vec_select:
+			(match_operand:VQW 1 "register_operand" "w")
+			(match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
+			 (match_operand:SI 3
+			   "aarch64_simd_shift_imm_bitsize_" "i")]
+			 VSHLL))]
+  "TARGET_SIMD"
+  {
+if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
+  return "shll2\\t%0., %1., %3";
+else
+  return "shll2\\t%0., %1., %3";
+  }
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
 (define_insn "aarch64_shll_n"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec: [(match_operand:VD_BHSI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
new file mode 100644
index ..23ed93d1dcbc3ca559efa6708b4ed5855fb6a050
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-lshift.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include 
+#include 
+
+#define ARR_SIZE 1024
+
+/* Should produce an shll,shll2 pair*/
+void sshll_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+__attribute__((optimize (0)))
+void sshll_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+{
+foo[i]   = a[i]   << 16;
+foo[i+1] = a[i+1] << 16;
+foo[i+2] = a[i+2] << 16;
+foo[i+3] = a[i+3] << 16;
+}
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+for( int i = 0; i < ARR_SIZE;i++)
+{
+  a[i] = i;
+  b[i] = 2*i;
+}
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+uint32_t foo_arr[ARR_SIZE];
+uint32_t bar_arr[ARR_SIZE];
+uint16_t a[ARR_SIZE];
+uint16_t b[ARR_SIZE];
+
+init(a, b);
+sshll_opt(foo_arr, a, b);
+

[2/3][vect] Add widening add, subtract vect patterns

2020-11-12 Thread Joel Hutton via Gcc-patches
Hi all,

This patch adds widening add and widening subtract patterns to 
tree-vect-patterns.

All 3 patches together bootstrapped and regression tested on aarch64.

gcc/ChangeLog:

2020-11-12  Joel Hutton  

        * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
        * optabs-tree.c (optab_for_tree_code): optabs for widening 
adds,subtracts
        * optabs.def (OPTAB_D): define vectorized widen add, subtracts
        * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, 
subtracts
        * tree-inline.c (estimate_operator_cost): Add case for widening adds, 
subtracts
        * tree-vect-generic.c (expand_vector_operations_1): Add case for 
widening adds, subtracts
        * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
        (vect_recog_widen_sub_pattern): New recog pattern
        (vect_recog_average_pattern): Update widened add code
        (vect_recog_average_pattern): Update widened add code
        * tree-vect-stmts.c (vectorizable_conversion): Add case for widened 
add, subtract
        (supportable_widening_operation): Add case for widened add, subtract
        * tree.def (WIDEN_ADD_EXPR): New tree code
        (WIDEN_SUB_EXPR): New tree code
        (VEC_WIDEN_ADD_HI_EXPR): New tree code
        (VEC_WIDEN_ADD_LO_EXPR): New tree code
        (VEC_WIDEN_SUB_HI_EXPR): New tree code
        (VEC_WIDEN_SUB_LO_EXPR): New tree code

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.


Ok for trunk?
From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Mon, 9 Nov 2020 15:44:18 +
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns.
Add aarch64 tests for patterns.

fix sad
---
 gcc/expr.c|  6 ++
 gcc/optabs-tree.c | 17 
 gcc/optabs.def|  8 ++
 .../gcc.target/aarch64/vect-widen-add.c   | 90 +++
 .../gcc.target/aarch64/vect-widen-sub.c   | 90 +++
 gcc/tree-cfg.c|  8 ++
 gcc/tree-inline.c |  6 ++
 gcc/tree-vect-generic.c   |  4 +
 gcc/tree-vect-patterns.c  | 32 +--
 gcc/tree-vect-stmts.c | 15 +++-
 gcc/tree.def  |  6 ++
 11 files changed, 276 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f07775870792729e3805436d7f2debafb6ca..ffc8aed5296174066849d9e0d73b1c352c20fd9e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  target, unsignedp);
   return target;
 
+case WIDEN_ADD_EXPR:
+case WIDEN_SUB_EXPR:
 case WIDEN_MULT_EXPR:
   /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
   }
 
+case VEC_WIDEN_ADD_HI_EXPR:
+case VEC_WIDEN_ADD_LO_EXPR:
+case VEC_WIDEN_SUB_HI_EXPR:
+case VEC_WIDEN_SUB_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+case VEC_WIDEN_ADD_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
+
+case VEC_WIDEN_ADD_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
+
+case VEC_WIDEN_SUB_LO_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
+
+case VEC_WIDEN_SUB_HI_EXPR:
+  return (TYPE_UNSIGNED (type)
+	  ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
+
+
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_widen_smult_lo_optab, 

[SLP][VECT] Add check to fix 96837

2020-09-29 Thread Joel Hutton via Gcc-patches
 Hi All,

The following patch adds a simple check to prevent slp stmts from vector 
constructors being rearranged. vect_attempt_slp_rearrange_stmts tries to 
rearrange to avoid a load permutation.

This fixes PR target/96837 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827
gcc/ChangeLog:

2020-09-29  Joel Hutton  

PR target/96837
* tree-vect-slp.c (vect_analyze_slp): Do not call 
vect_attempt_slp_rearrange_stmts for vector constructors.

gcc/testsuite/ChangeLog:

2020-09-29  Joel Hutton  

PR target/96837
* gcc.dg/vect/bb-slp-49.c: New test.From 2c738e2c0eddbc4fcdbf8ff2443bb809b36c7e28 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Tue, 29 Sep 2020 15:46:44 +0100
Subject: [PATCH] [SLP][VECT] Add check to fix 96827

Do not call vect_attempt_slp_rearrange_stmts if an slp instance is an
SLP_INSTANCE_ROOT_STMT, i.e. if the tree is built from a constructor
rather than a grouped store. This function is intended to rearrange
stmts in a reduction chain so they do not require load permutation.
Rearranging causes the resulting constructor to be in the wrong order.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-49.c | 28 +++
 gcc/tree-vect-slp.c   |  3 ++-
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-49.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-49.c b/gcc/testsuite/gcc.dg/vect/bb-slp-49.c
new file mode 100644
index ..e7101fcff4627bb545549bdfefd33c2ed58aee7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-49.c
@@ -0,0 +1,28 @@
+/* This checks that vectorized constructors have the correct ordering. */
+/* { dg-require-effective-target vect_int } */
+
+typedef int V __attribute__((__vector_size__(16)));
+
+__attribute__((__noipa__)) void
+foo (unsigned int x, V *y)
+{
+  unsigned int a[4] = { x + 0, x + 2, x + 4, x + 6 };
+  for (unsigned int i = 0; i < 3; ++i)
+if (a[i] == 1234)
+  a[i]--;
+  *y = (V) { a[3], a[2], a[1], a[0] };
+}
+
+int
+main ()
+{
+  V b;
+  foo (0, );
+  if (b[0] != 6 || b[1] != 4 || b[2] != 2 || b[3] != 0)
+__builtin_abort ();
+  return 0;
+}
+
+/* See that we vectorize an SLP instance.  */
+/* { dg-final { scan-tree-dump "Analyzing vectorizable constructor" "slp1" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c44fd396bf0b69a4153e46026c545bebb3797551..7ba24e241deb76c0fd884ccfff04675d1b050ef7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2495,7 +2495,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
   /* Reduction (there are no data-refs in the root).
 	 In reduction chain the order of the loads is not important.  */
   if (!STMT_VINFO_DATA_REF (stmt_info)
-	  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info))
+	  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
+	  && !SLP_INSTANCE_ROOT_STMT (instance))
 	vect_attempt_slp_rearrange_stmts (instance);
 }
 
-- 
2.17.1