On Fri, Apr 14, 2023 at 11:42 AM Andre Vieira (lists)
<andre.simoesdiasvie...@arm.com> wrote:
>
>
>
> On 14/04/2023 10:09, Richard Biener wrote:
> > On Fri, Apr 14, 2023 at 10:43 AM Andre Vieira (lists)
> > <andre.simoesdiasvie...@arm.com> wrote:
> >>
> >> Resending this to everyone (sorry for the double send Richard).
> >>
> >> On 14/04/2023 09:15, Andre Vieira (lists) wrote:
> >>   >
> >>   >
> >>   > On 14/04/2023 07:55, Richard Biener wrote:
> >>   >> On Thu, Apr 13, 2023 at 4:25 PM Andre Vieira (lists)
> >>   >> <andre.simoesdiasvie...@arm.com> wrote:
> >>   >>>
> >>   >>>
> >>   >>>
> >>   >>> On 13/04/2023 15:00, Richard Biener wrote:
> >>   >>>> On Thu, Apr 13, 2023 at 3:00 PM Andre Vieira (lists) via Gcc-patches
> >>   >>>> <gcc-patches@gcc.gnu.org> wrote:
> >>   >>>>>
> >>   >>>>>
> >>   >>>>>
> >>   >>>
> >>   >>> But that's not it, I've been looking at it, and there is code in 
> >> place
> >>   >>> that does what I expected which is defer the choice of vectype for 
> >> simd
> >>   >>> clones until vectorizable_simd_clone_call, unfortunately it has a
> >>   >>> mistaken assumption that simdclones don't return :/
> >>   >>
> >>   >> I think that's not it - when the SIMD clone returns a vector we have 
> >> to
> >>   >> determine the vector type in this function.  We cannot defer this.
> >>   >
> >>   > What's 'this function' here, do you mean we have to determine the
> >>   > vectype in 'vect_get_vector_types_for_stmt' &
> >>   > 'vect_determine_vf_for_stmt' ?
> >
> > Yes.
> >
> >> Because at that time we don't yet know
> >>   > what clone we will be using, this choice is done inside
> >>   > vectorizable_simd_clone_call. In fact, to choose the simd clone, we 
> >> need
> >>   > to know the vf as that has to be a multiple of the chosen clone's
> >>   > simdlen. So we simply can't use the simdclone's types (as that depends
> >>   > on the simdlen) to choose the vf because the choice of simdlend depends
> >>   > on the vf. And there was already code in place to handle this,
> >>   > unfortunately that code was wrong and had the wrong assumption that
> >>   > simdclones didn't return (probably was true at some point and 
> >> bitrotted).
> >
> > But to compute the VF we need to know the vector types!  We're only
> > calling vectorizable_* when the VF is final.  That said, the code you quote:
> >
> >>   >>
> >>   >>> see vect_get_vector_types_for_stmt:
> >>   >>> ...
> >>   >>>     if (gimple_get_lhs (stmt) == NULL_TREE
> >
> > is just for the case of a function without return value.  For this case
> > it's OK to do nothing - 'vectype' is the vector type of all vector defs
> > a stmt produces.
> >
> > For calls with a LHS it should fall through to generic code doing
> > get_vectype_for_scalar_type on the LHS type.
>
> I think that may work, but right now it will still go and look at the
> arguments of the call and use the smallest type among them to adjust the
> nunits (in 'vect_get_vector_types_for_stmt').
>
> In fact (this is just for illustration) if I hack that function like this:
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12745,8 +12745,11 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
>         /* The number of units is set according to the smallest scalar
>           type (or the largest vector size, but we only support one
>           vector size per vectorization).  */
> -      scalar_type = vect_get_smallest_scalar_type (stmt_info,
> -                                                  TREE_TYPE (vectype));
> +      if (simd_clone_call_p (stmt_info->stmt))
> +       scalar_type = TREE_TYPE (vectype);
> +      else
> +       scalar_type = vect_get_smallest_scalar_type (stmt_info,
> +                                                    TREE_TYPE (vectype));
>         if (scalar_type != TREE_TYPE (vectype))
>          {
>            if (dump_enabled_p ())
>
> It will use the same types as before without (-m32), like I explained
> before the -m32 turns the pointer inside MASK_CALL into a 32-bit pointer
> so now the smallest size is 32-bits. This makes it pick V8SI instead of
> the original V4DI (scalar return type is DImode). Changing the VF to 8,
> thus unrolling the loop as it needs to make 2 calls, each handling 4 nunits.

Ah, but then vect_get_smallest_scalar_type should simply ignore that
pointer in MASK_CALL.  It should only look at the arguments relevant
for vectorization.  So

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..70bb8595951 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -165,6 +165,8 @@ vect_get_smallest_scalar_type (stmt_vec_info
stmt_info, tree scalar_type)
            }
          else if (internal_fn_mask_index (ifn) == 0)
            i = 1;
+         else if (ifn == IFN_MASK_CALL)
+           i = 1;
        }
       if (i < gimple_call_num_args (call))
        {

?

Richard.

>
>

Reply via email to