Re: [AArch64] Enable generation of FRINTNZ instructions

Andre Vieira (lists) via Gcc-patches Mon, 10 Jan 2022 06:09:33 -0800

Yeah seems I forgot to send the latest version, my bad.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

* config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): Newpattern.

        * config/aarch64/iterators.md (FRINTNZ): New iterator.
        (frintnz_mode): New int attribute.
        (VSFDF): Make iterator conditional.
        * internal-fn.def (FTRUNC_INT): New IFN.
        * internal-fn.c (ftrunc_int_direct): New define.
        (expand_ftrunc_int_optab_fn): New custom expander.
        (direct_ftrunc_int_optab_supported_p): New supported_p.
        * match.pd: Add to the existing TRUNC pattern match.
        * optabs.def (ftrunc_int): New entry.
        * stor-layout.h (element_precision): Moved from here...
        * tree.h (element_precision): ... to here.
        (element_type): New declaration.
        * tree.c (element_type): New function.
        (element_precision): Changed to use element_type.

* tree-vect-stmts.c (vectorizable_internal_function): Addsupport for

        IFNs with different input types.
        (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
        * doc/md.texi: New entry for ftrunc pattern name.
        * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNzinstruction available.

        * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
        * gcc.target/aarch64/frintnz.c: New test.
        * gcc.target/aarch64/frintnz_vec.c: New test.

On 03/01/2022 12:18, Richard Biener wrote:

On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:

Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, I am
still surprised how many style things I still miss after years of gcc
development :(

On 17/12/2021 12:44, Richard Sandiford wrote:

@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
         rhs_type = unsigned_type_node;
       }
   -  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
     if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+       /* For FTRUNC this represents the argument that carries the type of
the
+          intermediate signed integer.  */
+       diff_opno = 1;
+      else
+       {
+         /* For masked operations this represents the argument that carries
the
+            mask.  */
+         diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+         masked = diff_opno >=  0;
+       }
+    }

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

I agree with this,  will change the loop to continue for argument 1 when
dealing with non-masked CFN's.

        }
[…]
diff --git a/gcc/tree.c b/gcc/tree.c
index
845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
cst_size_error *perr /* = NULL */)
     return true;
   }

-/* Return the precision of the type, or for a complex or vector type the

-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
   -unsigned int
-element_precision (const_tree type)
+tree
+element_type (const_tree type)
   {
     if (!TYPE_P (type))
       type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
     if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
       type = TREE_TYPE (type);
   -  return TYPE_PRECISION (type);
+  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,

Was just curious about something here, I thought the purpose of having
element_precision (before) and element_type (now) take a const_tree as an
argument was to make it clear we aren't changing the input type. I understand
that as it stands element_type could be an implicit const_cast (which I should
be using rather than the '(tree)' cast), but that's only if 'type' is a type
that isn't complex/vector, either way, we are conforming to the promise that
we aren't changing the incoming type, what the caller then does with the
result is up to them no?

I don't mind making the changes, just trying to understand the reasoning
behind it.

I'll send in a new patch with all the changes after the review on the match.pd
stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7423,12 +7423,18 @@ (define_insn "despeculate_simpleti"
    (set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc<mode><frintnz_mode>2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+                     FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_<frintnzs_op><mode>"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
        (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
                      FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (<MODE>mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "<frintnzs_op>\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
   [(set_attr "type" "f_rint<stype>")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
9160ce3e69c2c6b1b75e46f7aabd27e7949a269a..7962b15a87db2f1ede3836efbb827b8fb95266da
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -163,7 +163,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF 
"TARGET_SIMD_F16INST")
                                  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
+                            (V4SF "TARGET_SIMD")
+                            (V2DF "TARGET_SIMD")
+                            (DF "TARGET_FLOAT")
+                            (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3078,6 +3082,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
                               UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3485,6 +3491,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") 
(UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X 
"frint32x")
                              (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X 
"frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_<xx>.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
                         (UNSPEC_COND_CMPGE_WIDE "ge")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
+@item @samp{ftrunc@var{m}@var{n}2}
+Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
+the result in operand 0. Both operands have mode @var{m}, which is a scalar or
+vector floating-point mode.  An exception must be raised if operand 1 does not
+fit in a @var{n} mode signed integer as it would have if the truncation
+happened through separate floating point to integer conversion.
+
+
 @cindex @code{round@var{m}2} instruction pattern
 @item @samp{round@var{m}2}
 Round operand 1 to the nearest integer, rounding away from zero in the
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact 
hardware vector length.
 @item aarch64_fjcvtzs_hw
 AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
 instruction.
+
+@item aarch64_frintzx_ok
+AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
+FRINT32X and FRINT64X instructions.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 
b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139
 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -130,6 +130,7 @@ init_internal_fns ()
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
+#define ftrunc_int_direct { 0, 1, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
@@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab 
optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
+/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
+
+static void
+expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[2];
+  tree lhs, float_type, int_type;
+  rtx target, op;
+
+  lhs = gimple_call_lhs (stmt);
+  target = expand_normal (lhs);
+  op = expand_normal (gimple_call_arg (stmt, 0));
+
+  float_type = TREE_TYPE (lhs);
+  int_type = element_type (gimple_call_arg (stmt, 1));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
+  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
+
+  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
+                                     TYPE_MODE (int_type)), 2, ops);
+}
+
 /* Expand LOAD_LANES call STMT using optab OPTAB.  */
 
 static void
@@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
          != CODE_FOR_nothing);
 }
 
+static bool
+direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
+                                    optimization_type opt_type)
+{
+  return (convert_optab_handler (optab, TYPE_MODE (types.first),
+                               TYPE_MODE (element_type (types.second)),
+                               opt_type) != CODE_FOR_nothing);
+}
+
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
8891071a6a360961643731094379b607f317af17..a0fd75829e942f529c879c669e58c098b62b26ba
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
    - check_ptrs: used for check_{raw,war}_ptrs
+   - ftrunc_int: a unary conversion optab that takes and returns values of the
+   same mode, but internally converts via another mode.  This second mode is
+   specified using a dummy final function argument.
 
    DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
    maps to one of two optabs, depending on the signedness of an input.
@@ -275,6 +278,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
 DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
+DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)
 
 /* Binary math functions.  */
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 
84c9b918041eef3409bdb0fbe04565b90b25d6e9..a5d892ac1ebfaa7b5d5fa970baa04c8e5b8acb28
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3751,12 +3751,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    trapping behaviour, so require !flag_trapping_math. */
 #if GIMPLE
 (simplify
-   (float (fix_trunc @0))
-   (if (!flag_trapping_math
-       && types_match (type, TREE_TYPE (@0))
-       && direct_internal_fn_supported_p (IFN_TRUNC, type,
-                                         OPTIMIZE_FOR_BOTH))
-      (IFN_TRUNC @0)))
+   (float (fix_trunc@1 @0))
+   (if (types_match (type, TREE_TYPE (@0)))
+    (with {
+      tree int_type = element_type (@1);
+     }
+     (if (TYPE_SIGN (TREE_TYPE (@1)) == SIGNED
+         && direct_internal_fn_supported_p (IFN_FTRUNC_INT, type, int_type,
+                                            OPTIMIZE_FOR_BOTH))
+      (IFN_FTRUNC_INT @0 {
+       wide_int_to_tree (int_type, wi::max_value (TYPE_PRECISION (int_type),
+                                                 SIGNED)); })
+      (if (!flag_trapping_math
+          && direct_internal_fn_supported_p (IFN_TRUNC, type,
+                                             OPTIMIZE_FOR_BOTH))
+       (IFN_TRUNC @0))))))
 #endif
 
 /* If we have a narrowing conversion to an integral type that is fed by a
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
5fcf5386a0b3112ef9004055c82e15fe47668970..04a4ee82e15fe7b52e726f2ee0bf704c30ac450d
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,7 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", 
gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_optab, "satfractuns$I$b$Q$a2", UNSIGNED_SAT_FRACT, 
"satfractuns", gen_satfractuns_conv_libfunc)
 
+OPTAB_CD(ftruncint_optab, "ftrunc$a$b2")
 OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 
b67abebc0096113272bfb1221eabaabd08657a58..e0219c8af4846ea0f947586b1915d9d06cb6c107
 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -36,7 +36,6 @@ extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
 extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
-extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
 extern void fixup_unsigned_type (tree);
 extern void initialize_sizetypes (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c 
b/gcc/testsuite/gcc.target/aarch64/frintnz.c
new file mode 100644
index 
0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**     frint32z        s0, s0
+**     ret
+*/
+float
+f1 (float x)
+{
+  int y = x;
+  return (float) y;
+}
+
+/*
+** f2:
+**     frint64z        s0, s0
+**     ret
+*/
+float
+f2 (float x)
+{
+  long long int y = x;
+  return (float) y;
+}
+
+/*
+** f3:
+**     frint32z        d0, d0
+**     ret
+*/
+double
+f3 (double x)
+{
+  int y = x;
+  return (double) y;
+}
+
+/*
+** f4:
+**     frint64z        d0, d0
+**     ret
+*/
+double
+f4 (double x)
+{
+  long long int y = x;
+  return (double) y;
+}
+
+float
+f1_dont (float x)
+{
+  unsigned int y = x;
+  return (float) y;
+}
+
+float
+f2_dont (float x)
+{
+  unsigned long long int y = x;
+  return (float) y;
+}
+
+double
+f3_dont (double x)
+{
+  unsigned int y = x;
+  return (double) y;
+}
+
+double
+f4_dont (double x)
+{
+  unsigned long long int y = x;
+  return (double) y;
+}
+
+double
+f5_dont (double x)
+{
+  signed short y = x;
+  return (double) y;
+}
+
+/* Make sure the 'dont's don't generate any frintNz.  */
+/* { dg-final { scan-assembler-times {frint32z} 2 } } */
+/* { dg-final { scan-assembler-times {frint64z} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c 
b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
new file mode 100644
index 
0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.5-a" } */
+/* { dg-require-effective-target aarch64_frintnzx_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define TEST(name,float_type,int_type)                                 \
+void                                                                   \
+name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
+{                                                                      \
+  for (int i = 0; i < n; ++i)                                        \
+    {                                                                \
+      int_type x_i = x[i];                                           \
+      y[i] = (float_type) x_i;                                       \
+    }                                                                \
+}
+
+/*
+** f1:
+**     ...
+**     frint32z        v[0-9]+\.4s, v[0-9]+\.4s
+**     ...
+*/
+TEST(f1, float, int)
+
+/*
+** f2:
+**     ...
+**     frint64z        v[0-9]+\.4s, v[0-9]+\.4s
+**     ...
+*/
+TEST(f2, float, long long)
+
+/*
+** f3:
+**     ...
+**     frint32z        v[0-9]+\.2d, v[0-9]+\.2d
+**     ...
+*/
+TEST(f3, double, int)
+
+/*
+** f4:
+**     ...
+**     frint64z        v[0-9]+\.2d, v[0-9]+\.2d
+**     ...
+*/
+TEST(f4, double, long long)
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c 
b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
index 
07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3d80871c4cebd5fb5cac0714b3feee27038f05fd
 100644
--- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-skip-if "" { aarch64_frintnzx_ok } } */
 
 float
 f1 (float x)
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
c1ad97c6bd20d6e970edb24a125451580f014d55..5758e9cee4416b60b6766ecb37cbf3b37ac98522
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11399,6 +11399,32 @@ proc check_effective_target_arm_v8_3a_bkey_directive { 
} {
        }]
 }
 
+# Return 1 if the target supports Armv8.5-A scalar and Advanced SIMD
+# FRINT32[ZX] andd FRINT64[ZX] instructions, 0 otherwise. The test is valid for
+# AArch64.
+proc check_effective_target_aarch64_frintnzx_ok_nocache { } {
+
+    if { ![istarget aarch64*-*-*] } {
+        return 0;
+    }
+
+    if { [check_no_compiler_messages_nocache \
+             aarch64_frintnzx_ok assembly {
+       #if !defined (__ARM_FEATURE_FRINT)
+       #error "__ARM_FEATURE_FRINT not defined"
+       #endif
+    } [current_compiler_flags]] } {
+       return 1;
+    }
+
+    return 0;
+}
+
+proc check_effective_target_aarch64_frintnzx_ok { } {
+    return [check_cached_effective_target aarch64_frintnzx_ok \
+                check_effective_target_aarch64_frintnzx_ok_nocache] 
+}
+
 # Return 1 if the target supports executing the Armv8.1-M Mainline Low
 # Overhead Loop, 0 otherwise.  The test is valid for ARM.
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
 
 static internal_fn
 vectorizable_internal_function (combined_fn cfn, tree fndecl,
-                               tree vectype_out, tree vectype_in)
+                               tree vectype_out, tree vectype_in,
+                               tree *vectypes)
 {
   internal_fn ifn;
   if (internal_fn_p (cfn))
@@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree 
fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
        {
-         tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
-         tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+         tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
+         if (!type0)
+           type0 = vectype_in;
+         tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
+         if (!type1)
+           type1 = vectype_in;
          if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
                                              OPTIMIZE_FOR_SPEED))
            return ifn;
@@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
       rhs_type = unsigned_type_node;
     }
 
-  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
   if (internal_fn_p (cfn))
-    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+    {
+      if (cfn == CFN_FTRUNC_INT)
+       /* For FTRUNC this represents the argument that carries the type of the
+          intermediate signed integer.  */
+       diff_opno = 1;
+      else
+       {
+         /* For masked operations this represents the argument that carries the
+            mask.  */
+         diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+         masked = diff_opno >=  0;
+       }
+    }
 
   for (i = 0; i < nargs; i++)
     {
-      if ((int) i == mask_opno)
+      if ((int) i == diff_opno)
        {
-         if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
-                                      &op, &slp_op[i], &dt[i], &vectypes[i]))
-           return false;
-         continue;
+         if (masked)
+           {
+             if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
+                                          diff_opno, &op, &slp_op[i], &dt[i],
+                                          &vectypes[i]))
+               return false;
+           }
+         else
+           {
+             vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
+             continue;
+           }
        }
 
       if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
@@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
          return false;
        }
 
-      /* We can only handle calls with arguments of the same type.  */
-      if (rhs_type
-         && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+      if ((int) i != diff_opno)
        {
-         if (dump_enabled_p ())
-           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument types differ.\n");
-         return false;
-       }
-      if (!rhs_type)
-       rhs_type = TREE_TYPE (op);
+         /* We can only handle calls with arguments of the same type.  */
+         if (rhs_type
+             && !types_compatible_p (rhs_type, TREE_TYPE (op)))
+           {
+             if (dump_enabled_p ())
+               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                                "argument types differ.\n");
+             return false;
+           }
+         if (!rhs_type)
+           rhs_type = TREE_TYPE (op);
 
-      if (!vectype_in)
-       vectype_in = vectypes[i];
-      else if (vectypes[i]
-              && !types_compatible_p (vectypes[i], vectype_in))
-       {
-         if (dump_enabled_p ())
-           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "argument vector types differ.\n");
-         return false;
+         if (!vectype_in)
+           vectype_in = vectypes[i];
+         else if (vectypes[i]
+                  && !types_compatible_p (vectypes[i], vectype_in))
+           {
+             if (dump_enabled_p ())
+               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                                "argument vector types differ.\n");
+             return false;
+           }
        }
     }
   /* If all arguments are external or constant defs, infer the vector type
@@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
          || (modifier == NARROW
              && simple_integer_narrowing (vectype_out, vectype_in,
                                           &convert_code))))
-    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
-                                         vectype_in);
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
+                                         &vectypes[0]);
 
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
@@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
 
       if (loop_vinfo
          && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
-         && (reduc_idx >= 0 || mask_opno >= 0))
+         && (reduc_idx >= 0 || masked))
        {
          if (reduc_idx >= 0
              && (cond_fn == IFN_LAST
@@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
                   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
                   : ncopies);
              tree scalar_mask = NULL_TREE;
-             if (mask_opno >= 0)
-               scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+             if (masked)
+               scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
              vect_record_loop_mask (loop_vinfo, masks, nvectors,
                                     vectype_out, scalar_mask);
            }
@@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
                    {
                      /* We don't define any narrowing conditional functions
                         at present.  */
-                     gcc_assert (mask_opno < 0);
+                     gcc_assert (!masked);
                      tree half_res = make_ssa_name (vectype_in);
                      gcall *call
                        = gimple_build_call_internal_vec (ifn, vargs);
@@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
                    }
                  else
                    {
-                     if (mask_opno >= 0 && masked_loop_p)
+                     if (masked && masked_loop_p)
                        {
                          unsigned int vec_num = vec_oprnds0.length ();
                          /* Always true for SLP.  */
                          gcc_assert (ncopies == 1);
                          tree mask = vect_get_loop_mask (gsi, masks, vec_num,
                                                          vectype_out, i);
-                         vargs[mask_opno] = prepare_vec_mask
+                         vargs[diff_opno] = prepare_vec_mask
                            (loop_vinfo, TREE_TYPE (mask), mask,
-                            vargs[mask_opno], gsi);
+                            vargs[diff_opno], gsi);
                        }
 
                      gcall *call;
@@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
          if (masked_loop_p && reduc_idx >= 0)
            vargs[varg++] = vargs[reduc_idx + 1];
 
-         if (mask_opno >= 0 && masked_loop_p)
+         if (masked && masked_loop_p)
            {
              tree mask = vect_get_loop_mask (gsi, masks, ncopies,
                                              vectype_out, j);
-             vargs[mask_opno]
+             vargs[diff_opno]
                = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
-                                   vargs[mask_opno], gsi);
+                                   vargs[diff_opno], gsi);
            }
 
          gimple *new_stmt;
@@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
            {
              /* We don't define any narrowing conditional functions at
                 present.  */
-             gcc_assert (mask_opno < 0);
+             gcc_assert (!masked);
              tree half_res = make_ssa_name (vectype_in);
              gcall *call = gimple_build_call_internal_vec (ifn, vargs);
              gimple_call_set_lhs (call, half_res);
@@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
     {
       auto_vec<vec<tree> > vec_defs (nargs);
       /* We don't define any narrowing conditional functions at present.  */
-      gcc_assert (mask_opno < 0);
+      gcc_assert (!masked);
       for (j = 0; j < ncopies; ++j)
        {
          /* Build argument list for the vectorized call.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 
318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa
 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
    object or pointer.  Otherwise return null.  */
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
+extern tree element_type (tree);
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+extern unsigned int element_precision (const_tree);
+
 #endif  /* GCC_TREE_H  */
diff --git a/gcc/tree.c b/gcc/tree.c
index 
d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa
 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error 
*perr /* = NULL */)
   return true;
 }
 
-/* Return the precision of the type, or for a complex or vector type the
-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
 
-unsigned int
-element_precision (const_tree type)
+tree
+element_type (tree type)
 {
   if (!TYPE_P (type))
     type = TREE_TYPE (type);
@@ -6658,7 +6658,16 @@ element_precision (const_tree type)
   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
     type = TREE_TYPE (type);
 
-  return TYPE_PRECISION (type);
+  return const_cast<tree> (type);
+}
+
+/* Return the precision of the type, or for a complex or vector type the
+   precision of the type of its elements.  */
+
+unsigned int
+element_precision (const_tree type)
+{
+  return TYPE_PRECISION (element_type (const_cast<tree> (type)));
 }
 
 /* Return true if CODE represents an associative tree code.  Otherwise

Re: [AArch64] Enable generation of FRINTNZ instructions

Reply via email to