Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
On Fri, 11 Aug 2023, Richard Biener wrote: > > I think it converts SNan to QNan (when the partial vector has just one > > element which is SNan), so is a test for -fsignaling-nans missing? > > Hm, I guess that's a corner case that could happen when there's no > runtime profitability check on more than one element and when the > element accumulated is directly loaded from memory. OTOH the > loop vectorizer always expects an initial value for the reduction > and thus we perform either no add (when the loop isn't entered) > or at least a single add (when it is). So I think this particular > situation cannot occur? Yes, that makes sense, thanks for the elaboration. (it's a bit subtle so maybe worth a comment? not sure) > > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we > > can do the reduction by substituting negative zero for masked-off > > elements ? maybe it's worth diagnosing that case separately (i.e. > > as "not yet implemented", not an incorrect transform)? > > Ah, that's interesting. So the only case we can't handle is > -frounding-math -fsigned-zeros then. I'll see to adjust the patch > accordingly, like the following incremental patch: Yeah, nice! > > (note that in avx512 it's possible to materialize negative zeroes > > by mask with a single vpternlog instruction, which is cheap) > > It ends up loading the { -0.0, ... } constant from memory, the > { 0.0, ... } mask is handled by using a zero-masked load, so > indeed cheaper. I was thinking it could be easily done without a memory load, but got confused, sorry. Alexander
Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
On Fri, 11 Aug 2023, Alexander Monakov wrote: > > On Fri, 11 Aug 2023, Richard Biener wrote: > > > When we vectorize fold-left reductions with partial vectors but > > no target operation available we use a vector conditional to force > > excess elements to zero. But that doesn't correctly preserve > > the sign of zero. The following patch disables partial vector > > support in that case. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Does this look OK? With -frounding-math -fno-signed-zeros we are > > happily using the masking again, but that's OK, right? An additional > > + 0.0 shouldn't do anything here. > > I think it converts SNan to QNan (when the partial vector has just one > element which is SNan), so is a test for -fsignaling-nans missing? Hm, I guess that's a corner case that could happen when there's no runtime profitability check on more than one element and when the element accumulated is directly loaded from memory. OTOH the loop vectorizer always expects an initial value for the reduction and thus we perform either no add (when the loop isn't entered) or at least a single add (when it is). So I think this particular situation cannot occur? > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we > can do the reduction by substituting negative zero for masked-off > elements ? maybe it's worth diagnosing that case separately (i.e. > as "not yet implemented", not an incorrect transform)? Ah, that's interesting. So the only case we can't handle is -frounding-math -fsigned-zeros then. I'll see to adjust the patch accordingly, like the following incremental patch: > (note that in avx512 it's possible to materialize negative zeroes > by mask with a single vpternlog instruction, which is cheap) It ends up loading the { -0.0, ... } constant from memory, the { 0.0, ... } mask is handled by using a zero-masked load, so indeed cheaper. Richard. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 741b5c20389..bc3063c3615 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, tree vector_identity = NULL_TREE; if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) -vector_identity = build_zero_cst (vectype_out); +{ + vector_identity = build_zero_cst (vectype_out); + if (!HONOR_SIGNED_ZEROS (vectype_out)) + ; + else + { + gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)); + vector_identity = const_unop (NEGATE_EXPR, vectype_out, + vector_identity); + } +} tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL); int i; @@ -8040,12 +8050,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, else if (reduction_type == FOLD_LEFT_REDUCTION && reduc_fn == IFN_LAST && FLOAT_TYPE_P (vectype_in) - && HONOR_SIGNED_ZEROS (vectype_in)) + && HONOR_SIGNED_ZEROS (vectype_in) + && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "can't operate on partial vectors because" -" signed zeros need to be handled.\n"); +" signed zeros cannot be preserved.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } else
Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
Richard Biener writes: > When we vectorize fold-left reductions with partial vectors but > no target operation available we use a vector conditional to force > excess elements to zero. But that doesn't correctly preserve > the sign of zero. The following patch disables partial vector > support in that case. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > Does this look OK? LGTM. > With -frounding-math -fno-signed-zeros we are > happily using the masking again, but that's OK, right? An additional > + 0.0 shouldn't do anything here. Yeah, I would hope so. Thanks, Richard > > Thanks, > Richard. > > PR tree-optimization/110979 > * tree-vect-loop.cc (vectorizable_reduction): For > FOLD_LEFT_REDUCTION without target support make sure > we don't need to honor signed zeros. > > * gcc.dg/torture/pr110979.c: New testcase. > --- > gcc/testsuite/gcc.dg/torture/pr110979.c | 25 + > gcc/tree-vect-loop.cc | 11 +++ > 2 files changed, 36 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c > b/gcc/testsuite/gcc.dg/torture/pr110979.c > new file mode 100644 > index 000..c25ad7a8a31 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr110979.c > @@ -0,0 +1,25 @@ > +/* { dg-do run } */ > +/* { dg-additional-options "--param vect-partial-vector-usage=2" } */ > + > +#define FLT double > +#define N 20 > + > +__attribute__((noipa)) > +FLT > +foo3 (FLT *a) > +{ > + FLT sum = -0.0; > + for (int i = 0; i != N; i++) > +sum += a[i]; > + return sum; > +} > + > +int main() > +{ > + FLT a[N]; > + for (int i = 0; i != N; i++) > +a[i] = -0.0; > + if (!__builtin_signbit(foo3(a))) > +__builtin_abort(); > + return 0; > +} > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index bf8d677b584..741b5c20389 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, >" no conditional operation is available.\n"); > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > } > + else if (reduction_type == FOLD_LEFT_REDUCTION > +&& reduc_fn == IFN_LAST > +&& FLOAT_TYPE_P (vectype_in) > +&& HONOR_SIGNED_ZEROS (vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't operate on partial vectors because" > + " signed zeros need to be handled.\n"); > + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > + } >else > { > internal_fn mask_reduc_fn
Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
On Fri, 11 Aug 2023, Richard Biener wrote: > When we vectorize fold-left reductions with partial vectors but > no target operation available we use a vector conditional to force > excess elements to zero. But that doesn't correctly preserve > the sign of zero. The following patch disables partial vector > support in that case. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > Does this look OK? With -frounding-math -fno-signed-zeros we are > happily using the masking again, but that's OK, right? An additional > + 0.0 shouldn't do anything here. I think it converts SNan to QNan (when the partial vector has just one element which is SNan), so is a test for -fsignaling-nans missing? In the defaut -fno-rounding-math -fno-signaling-nans mode I think we can do the reduction by substituting negative zero for masked-off elements — maybe it's worth diagnosing that case separately (i.e. as "not yet implemented", not an incorrect transform)? (note that in avx512 it's possible to materialize negative zeroes by mask with a single vpternlog instruction, which is cheap) Alexander
[PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
When we vectorize fold-left reductions with partial vectors but no target operation available we use a vector conditional to force excess elements to zero. But that doesn't correctly preserve the sign of zero. The following patch disables partial vector support in that case. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Does this look OK? With -frounding-math -fno-signed-zeros we are happily using the masking again, but that's OK, right? An additional + 0.0 shouldn't do anything here. Thanks, Richard. PR tree-optimization/110979 * tree-vect-loop.cc (vectorizable_reduction): For FOLD_LEFT_REDUCTION without target support make sure we don't need to honor signed zeros. * gcc.dg/torture/pr110979.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr110979.c | 25 + gcc/tree-vect-loop.cc | 11 +++ 2 files changed, 36 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c b/gcc/testsuite/gcc.dg/torture/pr110979.c new file mode 100644 index 000..c25ad7a8a31 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr110979.c @@ -0,0 +1,25 @@ +/* { dg-do run } */ +/* { dg-additional-options "--param vect-partial-vector-usage=2" } */ + +#define FLT double +#define N 20 + +__attribute__((noipa)) +FLT +foo3 (FLT *a) +{ + FLT sum = -0.0; + for (int i = 0; i != N; i++) +sum += a[i]; + return sum; +} + +int main() +{ + FLT a[N]; + for (int i = 0; i != N; i++) +a[i] = -0.0; + if (!__builtin_signbit(foo3(a))) +__builtin_abort(); + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index bf8d677b584..741b5c20389 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, " no conditional operation is available.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } + else if (reduction_type == FOLD_LEFT_REDUCTION + && reduc_fn == IFN_LAST + && FLOAT_TYPE_P (vectype_in) + && HONOR_SIGNED_ZEROS (vectype_in)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"can't operate on partial vectors because" +" signed zeros need to be handled.\n"); + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; + } else { internal_fn mask_reduc_fn -- 2.35.3