> Am 31.10.2023 um 16:10 schrieb pan2...@intel.com:
>
> From: Pan Li <pan2...@intel.com>
>
> Update in v4:
>
> * Append the check to vectorizable_internal_function.
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
> for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> The below test are ongoing.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
>
Ok
Thanks,
Richard
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_internal_function): Add type
> size check for vectype_out doesn't participating for optab query.
> (vectorizable_call): Remove the type size check.
>
> Signed-off-by: Pan Li <pan2...@intel.com>
> ---
> gcc/tree-vect-stmts.cc | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree
> fndecl,
> const direct_internal_fn_info &info = direct_internal_fn (ifn);
> if (info.vectorizable)
> {
> + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> + /* The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query. */
> + if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> + return IFN_LAST;
> +
> if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
> OPTIMIZE_FOR_SPEED))
> return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
>
> return false;
> }
> - /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector. We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs. */
> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> - return false;
> - }
>
> if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>