Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Alexander Monakov


On Fri, 11 Aug 2023, Richard Biener wrote:

> > I think it converts SNan to QNan (when the partial vector has just one
> > element which is SNan), so is a test for -fsignaling-nans missing?
> 
> Hm, I guess that's a corner case that could happen when there's no
> runtime profitability check on more than one element and when the
> element accumulated is directly loaded from memory.  OTOH the
> loop vectorizer always expects an initial value for the reduction
> and thus we perform either no add (when the loop isn't entered)
> or at least a single add (when it is).  So I think this particular
> situation cannot occur?

Yes, that makes sense, thanks for the elaboration.
(it's a bit subtle so maybe worth a comment? not sure)

> > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> > can do the reduction by substituting negative zero for masked-off
> > elements ? maybe it's worth diagnosing that case separately (i.e.
> > as "not yet implemented", not an incorrect transform)?
> 
> Ah, that's interesting.  So the only case we can't handle is
> -frounding-math -fsigned-zeros then.  I'll see to adjust the patch
> accordingly, like the following incremental patch:

Yeah, nice!

> > (note that in avx512 it's possible to materialize negative zeroes
> > by mask with a single vpternlog instruction, which is cheap)
> 
> It ends up loading the { -0.0, ... } constant from memory, the
> { 0.0, ... } mask is handled by using a zero-masked load, so
> indeed cheaper.

I was thinking it could be easily done without a memory load,
but got confused, sorry.

Alexander


Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, Alexander Monakov wrote:

> 
> On Fri, 11 Aug 2023, Richard Biener wrote:
> 
> > When we vectorize fold-left reductions with partial vectors but
> > no target operation available we use a vector conditional to force
> > excess elements to zero.  But that doesn't correctly preserve
> > the sign of zero.  The following patch disables partial vector
> > support in that case.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Does this look OK?  With -frounding-math -fno-signed-zeros we are
> > happily using the masking again, but that's OK, right?  An additional
> > + 0.0 shouldn't do anything here.
> 
> I think it converts SNan to QNan (when the partial vector has just one
> element which is SNan), so is a test for -fsignaling-nans missing?

Hm, I guess that's a corner case that could happen when there's no
runtime profitability check on more than one element and when the
element accumulated is directly loaded from memory.  OTOH the
loop vectorizer always expects an initial value for the reduction
and thus we perform either no add (when the loop isn't entered)
or at least a single add (when it is).  So I think this particular
situation cannot occur?

> In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> can do the reduction by substituting negative zero for masked-off
> elements ? maybe it's worth diagnosing that case separately (i.e.
> as "not yet implemented", not an incorrect transform)?

Ah, that's interesting.  So the only case we can't handle is
-frounding-math -fsigned-zeros then.  I'll see to adjust the patch
accordingly, like the following incremental patch:

> (note that in avx512 it's possible to materialize negative zeroes
> by mask with a single vpternlog instruction, which is cheap)

It ends up loading the { -0.0, ... } constant from memory, the
{ 0.0, ... } mask is handled by using a zero-masked load, so
indeed cheaper.

Richard.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 741b5c20389..bc3063c3615 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info 
loop_vinfo,
 
   tree vector_identity = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-vector_identity = build_zero_cst (vectype_out);
+{
+  vector_identity = build_zero_cst (vectype_out);
+  if (!HONOR_SIGNED_ZEROS (vectype_out))
+   ;
+  else
+   {
+ gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+ vector_identity = const_unop (NEGATE_EXPR, vectype_out,
+   vector_identity);
+   }
+}
 
   tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL);
   int i;
@@ -8040,12 +8050,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (reduction_type == FOLD_LEFT_REDUCTION
   && reduc_fn == IFN_LAST
   && FLOAT_TYPE_P (vectype_in)
-  && HONOR_SIGNED_ZEROS (vectype_in))
+  && HONOR_SIGNED_ZEROS (vectype_in)
+  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors because"
-" signed zeros need to be handled.\n");
+" signed zeros cannot be preserved.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else



Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Does this look OK?

LGTM.

> With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

Yeah, I would hope so.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>   PR tree-optimization/110979
>   * tree-vect-loop.cc (vectorizable_reduction): For
>   FOLD_LEFT_REDUCTION without target support make sure
>   we don't need to honor signed zeros.
>
>   * gcc.dg/torture/pr110979.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +
>  gcc/tree-vect-loop.cc   | 11 +++
>  2 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c 
> b/gcc/testsuite/gcc.dg/torture/pr110979.c
> new file mode 100644
> index 000..c25ad7a8a31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
> +
> +#define FLT double
> +#define N 20
> +
> +__attribute__((noipa))
> +FLT
> +foo3 (FLT *a)
> +{
> +  FLT sum = -0.0;
> +  for (int i = 0; i != N; i++)
> +sum += a[i];
> +  return sum;
> +}
> +
> +int main()
> +{
> +  FLT a[N];
> +  for (int i = 0; i != N; i++)
> +a[i] = -0.0;
> +  if (!__builtin_signbit(foo3(a)))
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bf8d677b584..741b5c20389 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>" no conditional operation is available.\n");
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>   }
> +  else if (reduction_type == FOLD_LEFT_REDUCTION
> +&& reduc_fn == IFN_LAST
> +&& FLOAT_TYPE_P (vectype_in)
> +&& HONOR_SIGNED_ZEROS (vectype_in))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "can't operate on partial vectors because"
> +  " signed zeros need to be handled.\n");
> +   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> + }
>else
>   {
> internal_fn mask_reduc_fn


Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Alexander Monakov


On Fri, 11 Aug 2023, Richard Biener wrote:

> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Does this look OK?  With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

I think it converts SNan to QNan (when the partial vector has just one
element which is SNan), so is a test for -fsignaling-nans missing?

In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
can do the reduction by substituting negative zero for masked-off
elements — maybe it's worth diagnosing that case separately (i.e.
as "not yet implemented", not an incorrect transform)?

(note that in avx512 it's possible to materialize negative zeroes
by mask with a single vpternlog instruction, which is cheap)

Alexander


[PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Biener via Gcc-patches
When we vectorize fold-left reductions with partial vectors but
no target operation available we use a vector conditional to force
excess elements to zero.  But that doesn't correctly preserve
the sign of zero.  The following patch disables partial vector
support in that case.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Does this look OK?  With -frounding-math -fno-signed-zeros we are
happily using the masking again, but that's OK, right?  An additional
+ 0.0 shouldn't do anything here.

Thanks,
Richard.

PR tree-optimization/110979
* tree-vect-loop.cc (vectorizable_reduction): For
FOLD_LEFT_REDUCTION without target support make sure
we don't need to honor signed zeros.

* gcc.dg/torture/pr110979.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +
 gcc/tree-vect-loop.cc   | 11 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c 
b/gcc/testsuite/gcc.dg/torture/pr110979.c
new file mode 100644
index 000..c25ad7a8a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
+
+#define FLT double
+#define N 20
+
+__attribute__((noipa))
+FLT
+foo3 (FLT *a)
+{
+  FLT sum = -0.0;
+  for (int i = 0; i != N; i++)
+sum += a[i];
+  return sum;
+}
+
+int main()
+{
+  FLT a[N];
+  for (int i = 0; i != N; i++)
+a[i] = -0.0;
+  if (!__builtin_signbit(foo3(a)))
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..741b5c20389 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 " no conditional operation is available.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
+  else if (reduction_type == FOLD_LEFT_REDUCTION
+  && reduc_fn == IFN_LAST
+  && FLOAT_TYPE_P (vectype_in)
+  && HONOR_SIGNED_ZEROS (vectype_in))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" signed zeros need to be handled.\n");
+ LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+   }
   else
{
  internal_fn mask_reduc_fn
-- 
2.35.3