Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 21 Aug 2023 at 12:27, Richard Biener  wrote:
>
> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>
> > On Fri, 18 Aug 2023 at 17:11, Richard Biener  wrote:
> > >
> > > On Fri, 18 Aug 2023, Richard Biener wrote:
> > >
> > > > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Richard Biener  writes:
> > > > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > > > > >>  wrote:
> > > > > > >> > It doesn't seem to make a difference for x86.  That said, the 
> > > > > > >> > "fix" is
> > > > > > >> > probably sticking the correct target on the dump-check, it 
> > > > > > >> > seems
> > > > > > >> > that vect_fold_extract_last is no longer correct here.
> > > > > > >> Um sorry, I did go thru various checks in target-supports.exp, 
> > > > > > >> but not
> > > > > > >> sure which one will be appropriate for this case,
> > > > > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > > > > >
> > > > > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > > > > implemented the direct conversion support.  I suggest to implement
> > > > > > > such dg-checks if they are not present (I can't find them),
> > > > > > > possibly quite specific to the modes involved (like we have
> > > > > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > > > > just _float).
> > > > > >
> > > > > > Yeah, can't remember specific selectors for that feature.  TBH I 
> > > > > > think
> > > > > > most (all?) of the tests were AArch64-specific.
> > > > > Hi,
> > > > > As Richi mentioned above, the test now vectorizes on AArch64 because
> > > > > it has support for direct conversion
> > > > > between vectors while x86 doesn't. IIUC this is because
> > > > > supportable_convert_operation returns true
> > > > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > > > > doing the conversion ?
> > > > >
> > > > > In the attached patch, I added a new target check vect_extend which
> > > > > (currently) returns 1 only for aarch64*-*-*,
> > > > > which makes the test PASS on both the targets, altho I am not sure if
> > > > > this is entirely correct.
> > > > > Does the patch look OK ?
> > > >
> > > > Can you make vect_extend more specific, say vect_extend_hi_si or
> > > > what is specifically needed here?  Note I'll have to investigate
> > > > why x86 cannot vectorize here since in fact it does have
> > > > the extend operation ... it might be also worth splitting the
> > > > sign/zero extend case, so - vect_sign_extend_hi_si or
> > > > vect_extend_short_int?
> > >
> > > And now having anaylzed _why_ x86 doesn't vectorize it's rather
> > > why we get this vectorized with NEON which is because
> > >
> > > static opt_machine_mode
> > > aarch64_vectorize_related_mode (machine_mode vector_mode,
> > > scalar_mode element_mode,
> > > poly_uint64 nunits)
> > > {
> > > ...
> > >   /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
> > >   if (TARGET_SIMD
> > >   && (vec_flags & VEC_ADVSIMD)
> > >   && known_eq (nunits, 0U)
> > >   && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
> > >   && maybe_ge (GET_MODE_BITSIZE (element_mode)
> > >* GET_MODE_NUNITS (vector_mode), 128U))
> > > {
> > >   machine_mode res = aarch64_simd_container_mode (element_mode, 128);
> > >   if (VECTOR_MODE_P (res))
> > > return res;
> > >
> > > which makes us get a V4SImode vector for a V4HImode loop vector_mode.
> > Thanks for the explanation!
> > >
> > > So I think the appropriate effective dejagnu target is
> > > aarch64-*-* (there's none specifically to advsimd, not sure if one
> > > can disable that?)
> > The attached patch uses aarch64*-*-* target check, and additionally
> > for SVE (and other targets supporting vect_fold_extract_last) it
> > checks
> > if the condition reduction was carried out using FOLD_EXTRACT_LAST.
> > Does that look OK ?
>
> Works for me.
Thanks, committed to trunk in dd606dc7c7e49feb7a900902ec6d35b421789173

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> >
> > > Richard.
> > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-20 Thread Richard Biener via Gcc-patches
On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:

> On Fri, 18 Aug 2023 at 17:11, Richard Biener  wrote:
> >
> > On Fri, 18 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Richard Biener  writes:
> > > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > > > >>  wrote:
> > > > > >> > It doesn't seem to make a difference for x86.  That said, the 
> > > > > >> > "fix" is
> > > > > >> > probably sticking the correct target on the dump-check, it seems
> > > > > >> > that vect_fold_extract_last is no longer correct here.
> > > > > >> Um sorry, I did go thru various checks in target-supports.exp, but 
> > > > > >> not
> > > > > >> sure which one will be appropriate for this case,
> > > > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > > > >
> > > > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > > > implemented the direct conversion support.  I suggest to implement
> > > > > > such dg-checks if they are not present (I can't find them),
> > > > > > possibly quite specific to the modes involved (like we have
> > > > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > > > just _float).
> > > > >
> > > > > Yeah, can't remember specific selectors for that feature.  TBH I think
> > > > > most (all?) of the tests were AArch64-specific.
> > > > Hi,
> > > > As Richi mentioned above, the test now vectorizes on AArch64 because
> > > > it has support for direct conversion
> > > > between vectors while x86 doesn't. IIUC this is because
> > > > supportable_convert_operation returns true
> > > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > > > doing the conversion ?
> > > >
> > > > In the attached patch, I added a new target check vect_extend which
> > > > (currently) returns 1 only for aarch64*-*-*,
> > > > which makes the test PASS on both the targets, altho I am not sure if
> > > > this is entirely correct.
> > > > Does the patch look OK ?
> > >
> > > Can you make vect_extend more specific, say vect_extend_hi_si or
> > > what is specifically needed here?  Note I'll have to investigate
> > > why x86 cannot vectorize here since in fact it does have
> > > the extend operation ... it might be also worth splitting the
> > > sign/zero extend case, so - vect_sign_extend_hi_si or
> > > vect_extend_short_int?
> >
> > And now having anaylzed _why_ x86 doesn't vectorize it's rather
> > why we get this vectorized with NEON which is because
> >
> > static opt_machine_mode
> > aarch64_vectorize_related_mode (machine_mode vector_mode,
> > scalar_mode element_mode,
> > poly_uint64 nunits)
> > {
> > ...
> >   /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
> >   if (TARGET_SIMD
> >   && (vec_flags & VEC_ADVSIMD)
> >   && known_eq (nunits, 0U)
> >   && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
> >   && maybe_ge (GET_MODE_BITSIZE (element_mode)
> >* GET_MODE_NUNITS (vector_mode), 128U))
> > {
> >   machine_mode res = aarch64_simd_container_mode (element_mode, 128);
> >   if (VECTOR_MODE_P (res))
> > return res;
> >
> > which makes us get a V4SImode vector for a V4HImode loop vector_mode.
> Thanks for the explanation!
> >
> > So I think the appropriate effective dejagnu target is
> > aarch64-*-* (there's none specifically to advsimd, not sure if one
> > can disable that?)
> The attached patch uses aarch64*-*-* target check, and additionally
> for SVE (and other targets supporting vect_fold_extract_last) it
> checks
> if the condition reduction was carried out using FOLD_EXTRACT_LAST.
> Does that look OK ?

Works for me.

Richard.

> Thanks,
> Prathamesh
> >
> 
> > Richard.
> >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Richard
> > > >
> > >
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-19 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 18 Aug 2023 at 17:11, Richard Biener  wrote:
>
> On Fri, 18 Aug 2023, Richard Biener wrote:
>
> > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener  writes:
> > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > > >>  wrote:
> > > > >> > It doesn't seem to make a difference for x86.  That said, the 
> > > > >> > "fix" is
> > > > >> > probably sticking the correct target on the dump-check, it seems
> > > > >> > that vect_fold_extract_last is no longer correct here.
> > > > >> Um sorry, I did go thru various checks in target-supports.exp, but 
> > > > >> not
> > > > >> sure which one will be appropriate for this case,
> > > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > > >
> > > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > > implemented the direct conversion support.  I suggest to implement
> > > > > such dg-checks if they are not present (I can't find them),
> > > > > possibly quite specific to the modes involved (like we have
> > > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > > just _float).
> > > >
> > > > Yeah, can't remember specific selectors for that feature.  TBH I think
> > > > most (all?) of the tests were AArch64-specific.
> > > Hi,
> > > As Richi mentioned above, the test now vectorizes on AArch64 because
> > > it has support for direct conversion
> > > between vectors while x86 doesn't. IIUC this is because
> > > supportable_convert_operation returns true
> > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > > doing the conversion ?
> > >
> > > In the attached patch, I added a new target check vect_extend which
> > > (currently) returns 1 only for aarch64*-*-*,
> > > which makes the test PASS on both the targets, altho I am not sure if
> > > this is entirely correct.
> > > Does the patch look OK ?
> >
> > Can you make vect_extend more specific, say vect_extend_hi_si or
> > what is specifically needed here?  Note I'll have to investigate
> > why x86 cannot vectorize here since in fact it does have
> > the extend operation ... it might be also worth splitting the
> > sign/zero extend case, so - vect_sign_extend_hi_si or
> > vect_extend_short_int?
>
> And now having anaylzed _why_ x86 doesn't vectorize it's rather
> why we get this vectorized with NEON which is because
>
> static opt_machine_mode
> aarch64_vectorize_related_mode (machine_mode vector_mode,
> scalar_mode element_mode,
> poly_uint64 nunits)
> {
> ...
>   /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
>   if (TARGET_SIMD
>   && (vec_flags & VEC_ADVSIMD)
>   && known_eq (nunits, 0U)
>   && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
>   && maybe_ge (GET_MODE_BITSIZE (element_mode)
>* GET_MODE_NUNITS (vector_mode), 128U))
> {
>   machine_mode res = aarch64_simd_container_mode (element_mode, 128);
>   if (VECTOR_MODE_P (res))
> return res;
>
> which makes us get a V4SImode vector for a V4HImode loop vector_mode.
Thanks for the explanation!
>
> So I think the appropriate effective dejagnu target is
> aarch64-*-* (there's none specifically to advsimd, not sure if one
> can disable that?)
The attached patch uses aarch64*-*-* target check, and additionally
for SVE (and other targets supporting vect_fold_extract_last) it
checks
if the condition reduction was carried out using FOLD_EXTRACT_LAST.
Does that look OK ?

Thanks,
Prathamesh
>

> Richard.
>
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Richard
> > >
> >
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
index 16cdcd1c6eb..58c46df5c54 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-7.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -52,5 +52,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump "optimizing condition reduction with 
FOLD_EXTRACT_LAST" "vect" { target vect_fold_extract_last } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-18 Thread Richard Biener via Gcc-patches
On Fri, 18 Aug 2023, Richard Biener wrote:

> On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> 
> > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > >>  wrote:
> > > >> > It doesn't seem to make a difference for x86.  That said, the "fix" 
> > > >> > is
> > > >> > probably sticking the correct target on the dump-check, it seems
> > > >> > that vect_fold_extract_last is no longer correct here.
> > > >> Um sorry, I did go thru various checks in target-supports.exp, but not
> > > >> sure which one will be appropriate for this case,
> > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > >
> > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > implemented the direct conversion support.  I suggest to implement
> > > > such dg-checks if they are not present (I can't find them),
> > > > possibly quite specific to the modes involved (like we have
> > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > just _float).
> > >
> > > Yeah, can't remember specific selectors for that feature.  TBH I think
> > > most (all?) of the tests were AArch64-specific.
> > Hi,
> > As Richi mentioned above, the test now vectorizes on AArch64 because
> > it has support for direct conversion
> > between vectors while x86 doesn't. IIUC this is because
> > supportable_convert_operation returns true
> > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > doing the conversion ?
> > 
> > In the attached patch, I added a new target check vect_extend which
> > (currently) returns 1 only for aarch64*-*-*,
> > which makes the test PASS on both the targets, altho I am not sure if
> > this is entirely correct.
> > Does the patch look OK ?
> 
> Can you make vect_extend more specific, say vect_extend_hi_si or
> what is specifically needed here?  Note I'll have to investigate
> why x86 cannot vectorize here since in fact it does have
> the extend operation ... it might be also worth splitting the
> sign/zero extend case, so - vect_sign_extend_hi_si or
> vect_extend_short_int?

And now having anaylzed _why_ x86 doesn't vectorize it's rather
why we get this vectorized with NEON which is because

static opt_machine_mode
aarch64_vectorize_related_mode (machine_mode vector_mode,
scalar_mode element_mode,
poly_uint64 nunits)
{
...
  /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
  if (TARGET_SIMD
  && (vec_flags & VEC_ADVSIMD)
  && known_eq (nunits, 0U)
  && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
  && maybe_ge (GET_MODE_BITSIZE (element_mode)
   * GET_MODE_NUNITS (vector_mode), 128U))
{
  machine_mode res = aarch64_simd_container_mode (element_mode, 128);
  if (VECTOR_MODE_P (res))
return res;

which makes us get a V4SImode vector for a V4HImode loop vector_mode.

So I think the appropriate effective dejagnu target is
aarch64-*-* (there's none specifically to advsimd, not sure if one
can disable that?)

Richard.

> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-18 Thread Richard Biener via Gcc-patches
On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:

> On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener  
> > >> wrote:
> > >> > It doesn't seem to make a difference for x86.  That said, the "fix" is
> > >> > probably sticking the correct target on the dump-check, it seems
> > >> > that vect_fold_extract_last is no longer correct here.
> > >> Um sorry, I did go thru various checks in target-supports.exp, but not
> > >> sure which one will be appropriate for this case,
> > >> and am stuck here :/ Could you please suggest how to proceed ?
> > >
> > > Maybe Richard S. knows the magic thing to test, he originally
> > > implemented the direct conversion support.  I suggest to implement
> > > such dg-checks if they are not present (I can't find them),
> > > possibly quite specific to the modes involved (like we have
> > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > just _float).
> >
> > Yeah, can't remember specific selectors for that feature.  TBH I think
> > most (all?) of the tests were AArch64-specific.
> Hi,
> As Richi mentioned above, the test now vectorizes on AArch64 because
> it has support for direct conversion
> between vectors while x86 doesn't. IIUC this is because
> supportable_convert_operation returns true
> for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> doing the conversion ?
> 
> In the attached patch, I added a new target check vect_extend which
> (currently) returns 1 only for aarch64*-*-*,
> which makes the test PASS on both the targets, altho I am not sure if
> this is entirely correct.
> Does the patch look OK ?

Can you make vect_extend more specific, say vect_extend_hi_si or
what is specifically needed here?  Note I'll have to investigate
why x86 cannot vectorize here since in fact it does have
the extend operation ... it might be also worth splitting the
sign/zero extend case, so - vect_sign_extend_hi_si or
vect_extend_short_int?

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-17 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> >> On Mon, 7 Aug 2023 at 13:19, Richard Biener  
> >> wrote:
> >> > It doesn't seem to make a difference for x86.  That said, the "fix" is
> >> > probably sticking the correct target on the dump-check, it seems
> >> > that vect_fold_extract_last is no longer correct here.
> >> Um sorry, I did go thru various checks in target-supports.exp, but not
> >> sure which one will be appropriate for this case,
> >> and am stuck here :/ Could you please suggest how to proceed ?
> >
> > Maybe Richard S. knows the magic thing to test, he originally
> > implemented the direct conversion support.  I suggest to implement
> > such dg-checks if they are not present (I can't find them),
> > possibly quite specific to the modes involved (like we have
> > other checks with _qi_to_hi suffixes, for float modes maybe
> > just _float).
>
> Yeah, can't remember specific selectors for that feature.  TBH I think
> most (all?) of the tests were AArch64-specific.
Hi,
As Richi mentioned above, the test now vectorizes on AArch64 because
it has support for direct conversion
between vectors while x86 doesn't. IIUC this is because
supportable_convert_operation returns true
for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
doing the conversion ?

In the attached patch, I added a new target check vect_extend which
(currently) returns 1 only for aarch64*-*-*,
which makes the test PASS on both the targets, altho I am not sure if
this is entirely correct.
Does the patch look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
index 16cdcd1c6eb..c8623854af5 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-7.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -52,5 +52,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_extend } 
} } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 92b6f69730e..29ef64b84f3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7768,6 +7768,16 @@ proc check_effective_target_vect_unpack { } {
 || [istarget amdgcn*-*-*] }}]
 }
 
+# Return 1 if the target plus current options supports vector
+# conversion of chars (to shorts) and shorts (to ints), 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_extend { } {
+return [check_cached_effective_target_indexed vect_extend {
+  expr { [istarget aarch64*-*-*]}}]
+}
+
 # Return 1 if the target plus current options does not guarantee
 # that its STACK_BOUNDARY is >= the reguired vector alignment.
 #


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
>> On Mon, 7 Aug 2023 at 13:19, Richard Biener  
>> wrote:
>> > It doesn't seem to make a difference for x86.  That said, the "fix" is
>> > probably sticking the correct target on the dump-check, it seems
>> > that vect_fold_extract_last is no longer correct here.
>> Um sorry, I did go thru various checks in target-supports.exp, but not
>> sure which one will be appropriate for this case,
>> and am stuck here :/ Could you please suggest how to proceed ?
>
> Maybe Richard S. knows the magic thing to test, he originally
> implemented the direct conversion support.  I suggest to implement
> such dg-checks if they are not present (I can't find them),
> possibly quite specific to the modes involved (like we have
> other checks with _qi_to_hi suffixes, for float modes maybe
> just _float).

Yeah, can't remember specific selectors for that feature.  TBH I think
most (all?) of the tests were AArch64-specific.

Thanks,
Richard


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-15 Thread Richard Biener via Gcc-patches
On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:

> On Mon, 7 Aug 2023 at 13:19, Richard Biener  
> wrote:
> >
> > On Mon, Aug 7, 2023 at 2:05?AM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > On Thu, 3 Aug 2023 at 17:48, Richard Biener  wrote:
> > > >
> > > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > > >
> > > > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > > > >
> > > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > > > > The following delays sinking of loads within the same 
> > > > > > > > > > innermost
> > > > > > > > > > loop when it was unconditional before.  That's a not 
> > > > > > > > > > uncommon
> > > > > > > > > > issue preventing vectorization when masked loads are not 
> > > > > > > > > > available.
> > > > > > > > > >
> > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > > > >
> > > > > > > > > > I have a followup patch improving sinking that without this 
> > > > > > > > > > would
> > > > > > > > > > cause more of the problematic sinking - now that we have a 
> > > > > > > > > > second
> > > > > > > > > > sink pass after loop opts this looks like a reasonable 
> > > > > > > > > > approach?
> > > > > > > > > >
> > > > > > > > > > OK?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > >  PR tree-optimization/92335
> > > > > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > > > > >  in innermost loops to conditional executed places.
> > > > > > > > > >
> > > > > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > > > > I think it's reasonable -- there's probably going to be cases 
> > > > > > > > > where it's not
> > > > > > > > > great, but more often than not I think it's going to be a 
> > > > > > > > > reasonable
> > > > > > > > > heuristic.
> > > > > > > > >
> > > > > > > > > If there is undesirable fallout, better to find it over the 
> > > > > > > > > coming months than
> > > > > > > > > next spring.  So I'd suggest we go forward now to give more 
> > > > > > > > > time to find any
> > > > > > > > > pathological cases (if they exist).
> > > > > > > >
> > > > > > > > Agreed, I've pushed this now.
> > > > > > > Hi Richard,
> > > > > > > After this patch (committed in 
> > > > > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP 
> > > > > > > VECTORIZED"
> > > > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > > > >
> > > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { 
> > > > > > > target {
> > > > > > > ! vect_fold_extract_last } } } } */
> > > > > > >
> > > > > > > With your commit, condition_reduction in pr65947-7.c gets 
> > > > > > > vectorized
> > > > > > > regardless of vect_fold_extract_last,
> > > > > > > which gates the above test (which is an improvement, because the
> > > > > > > function didn't get vectorized before the commit).
> > > > > > >
> > > > > > > The attached patch thus removes the gating on 
> > > > > > > vect_fold_extract_last,
> > > > > > > and the test passes again.
> > > > > > > OK to commit ?
> > > > > >
> > > > > > OK.
> > > > >
> > > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > > > > critical target condition is missing.  Can you figure out which?
> > > >
> > > > I see
> > > >
> > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > > note:   vect_is_simple_use: operand last_19 = PHI ,
> > > > type of def: reduction
> > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > > note:   vect_is_simple_use: vectype vector(4) int
> > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > > missed:   multiple types in double reduction or condition reduction or
> > > > fold-left reduction.
> > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> > > > missed:   not vectorized: relevant phi not supported: last_19 = PHI
> > > > 
> > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > > missed:  bad operation or unsupported loop bound.
> > > Hi Richard,
> > > Looking at the aarch64 vect dump, it seems the loop in
> > > condition_redu

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-14 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 7 Aug 2023 at 13:19, Richard Biener  wrote:
>
> On Mon, Aug 7, 2023 at 2:05 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > On Thu, 3 Aug 2023 at 17:48, Richard Biener  wrote:
> > >
> > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > >
> > > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > > >
> > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > > > The following delays sinking of loads within the same 
> > > > > > > > > innermost
> > > > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > > > issue preventing vectorization when masked loads are not 
> > > > > > > > > available.
> > > > > > > > >
> > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > > >
> > > > > > > > > I have a followup patch improving sinking that without this 
> > > > > > > > > would
> > > > > > > > > cause more of the problematic sinking - now that we have a 
> > > > > > > > > second
> > > > > > > > > sink pass after loop opts this looks like a reasonable 
> > > > > > > > > approach?
> > > > > > > > >
> > > > > > > > > OK?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >  PR tree-optimization/92335
> > > > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > > > >  in innermost loops to conditional executed places.
> > > > > > > > >
> > > > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > > > I think it's reasonable -- there's probably going to be cases 
> > > > > > > > where it's not
> > > > > > > > great, but more often than not I think it's going to be a 
> > > > > > > > reasonable
> > > > > > > > heuristic.
> > > > > > > >
> > > > > > > > If there is undesirable fallout, better to find it over the 
> > > > > > > > coming months than
> > > > > > > > next spring.  So I'd suggest we go forward now to give more 
> > > > > > > > time to find any
> > > > > > > > pathological cases (if they exist).
> > > > > > >
> > > > > > > Agreed, I've pushed this now.
> > > > > > Hi Richard,
> > > > > > After this patch (committed in 
> > > > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP 
> > > > > > VECTORIZED"
> > > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > > >
> > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { 
> > > > > > target {
> > > > > > ! vect_fold_extract_last } } } } */
> > > > > >
> > > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > > > regardless of vect_fold_extract_last,
> > > > > > which gates the above test (which is an improvement, because the
> > > > > > function didn't get vectorized before the commit).
> > > > > >
> > > > > > The attached patch thus removes the gating on 
> > > > > > vect_fold_extract_last,
> > > > > > and the test passes again.
> > > > > > OK to commit ?
> > > > >
> > > > > OK.
> > > >
> > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > > > critical target condition is missing.  Can you figure out which?
> > >
> > > I see
> > >
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > note:   vect_is_simple_use: operand last_19 = PHI ,
> > > type of def: reduction
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > note:   vect_is_simple_use: vectype vector(4) int
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > missed:   multiple types in double reduction or condition reduction or
> > > fold-left reduction.
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> > > missed:   not vectorized: relevant phi not supported: last_19 = PHI
> > > 
> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > > missed:  bad operation or unsupported loop bound.
> > Hi Richard,
> > Looking at the aarch64 vect dump, it seems the loop in
> > condition_reduction gets vectorized with V4HI mode
> > while fails for other modes in vectorizable_condition:
> >
> >   if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
> >   && ncopies > 1)
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> 

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-07 Thread Richard Biener via Gcc-patches
On Mon, Aug 7, 2023 at 2:05 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Thu, 3 Aug 2023 at 17:48, Richard Biener  wrote:
> >
> > On Thu, 3 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 3 Aug 2023, Richard Biener wrote:
> > >
> > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > > The following delays sinking of loads within the same innermost
> > > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > > issue preventing vectorization when masked loads are not 
> > > > > > > > available.
> > > > > > > >
> > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > > >
> > > > > > > > I have a followup patch improving sinking that without this 
> > > > > > > > would
> > > > > > > > cause more of the problematic sinking - now that we have a 
> > > > > > > > second
> > > > > > > > sink pass after loop opts this looks like a reasonable approach?
> > > > > > > >
> > > > > > > > OK?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >  PR tree-optimization/92335
> > > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > > >  in innermost loops to conditional executed places.
> > > > > > > >
> > > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > > I think it's reasonable -- there's probably going to be cases 
> > > > > > > where it's not
> > > > > > > great, but more often than not I think it's going to be a 
> > > > > > > reasonable
> > > > > > > heuristic.
> > > > > > >
> > > > > > > If there is undesirable fallout, better to find it over the 
> > > > > > > coming months than
> > > > > > > next spring.  So I'd suggest we go forward now to give more time 
> > > > > > > to find any
> > > > > > > pathological cases (if they exist).
> > > > > >
> > > > > > Agreed, I've pushed this now.
> > > > > Hi Richard,
> > > > > After this patch (committed in 
> > > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP 
> > > > > VECTORIZED"
> > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > >
> > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
> > > > > ! vect_fold_extract_last } } } } */
> > > > >
> > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > > regardless of vect_fold_extract_last,
> > > > > which gates the above test (which is an improvement, because the
> > > > > function didn't get vectorized before the commit).
> > > > >
> > > > > The attached patch thus removes the gating on vect_fold_extract_last,
> > > > > and the test passes again.
> > > > > OK to commit ?
> > > >
> > > > OK.
> > >
> > > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > > critical target condition is missing.  Can you figure out which?
> >
> > I see
> >
> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > note:   vect_is_simple_use: operand last_19 = PHI ,
> > type of def: reduction
> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > note:   vect_is_simple_use: vectype vector(4) int
> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > missed:   multiple types in double reduction or condition reduction or
> > fold-left reduction.
> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> > missed:   not vectorized: relevant phi not supported: last_19 = PHI
> > 
> > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> > missed:  bad operation or unsupported loop bound.
> Hi Richard,
> Looking at the aarch64 vect dump, it seems the loop in
> condition_reduction gets vectorized with V4HI mode
> while fails for other modes in vectorizable_condition:
>
>   if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
>   && ncopies > 1)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "multiple types in double reduction or condition "
>  "reduction or fold-left reduction.\n");
>   return false;
> }
>
> From the dump:
> foo.c:9:21: note:   === vect_analyze_loop_operations ===
> foo.c:9:21: note:   examining phi: last_19 = PHI 
> foo.c:9:21: note:   vect_is_simpl

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-06 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 3 Aug 2023 at 17:48, Richard Biener  wrote:
>
> On Thu, 3 Aug 2023, Richard Biener wrote:
>
> > On Thu, 3 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote:
> > >
> > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Mon, 31 Jul 2023, Jeff Law wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > > > > > The following delays sinking of loads within the same innermost
> > > > > > > loop when it was unconditional before.  That's a not uncommon
> > > > > > > issue preventing vectorization when masked loads are not 
> > > > > > > available.
> > > > > > >
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > I have a followup patch improving sinking that without this would
> > > > > > > cause more of the problematic sinking - now that we have a second
> > > > > > > sink pass after loop opts this looks like a reasonable approach?
> > > > > > >
> > > > > > > OK?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >  PR tree-optimization/92335
> > > > > > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > > > > > >  optimizations avoid sinking unconditional loads/stores
> > > > > > >  in innermost loops to conditional executed places.
> > > > > > >
> > > > > > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > > > > > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > > > > > >  expect predictive commoning to happen instead of sinking.
> > > > > > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > > > > > I think it's reasonable -- there's probably going to be cases where 
> > > > > > it's not
> > > > > > great, but more often than not I think it's going to be a reasonable
> > > > > > heuristic.
> > > > > >
> > > > > > If there is undesirable fallout, better to find it over the coming 
> > > > > > months than
> > > > > > next spring.  So I'd suggest we go forward now to give more time to 
> > > > > > find any
> > > > > > pathological cases (if they exist).
> > > > >
> > > > > Agreed, I've pushed this now.
> > > > Hi Richard,
> > > > After this patch (committed in 
> > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
> > > > pr65947-7.c "failed" for aarch64-linux-gnu:
> > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
> > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
> > > > scan-tree-dump-not vect "LOOP VECTORIZED"
> > > >
> > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
> > > > ! vect_fold_extract_last } } } } */
> > > >
> > > > With your commit, condition_reduction in pr65947-7.c gets vectorized
> > > > regardless of vect_fold_extract_last,
> > > > which gates the above test (which is an improvement, because the
> > > > function didn't get vectorized before the commit).
> > > >
> > > > The attached patch thus removes the gating on vect_fold_extract_last,
> > > > and the test passes again.
> > > > OK to commit ?
> > >
> > > OK.
> >
> > Or wait - the loop doesn't vectorize on x86_64, so I guess one
> > critical target condition is missing.  Can you figure out which?
>
> I see
>
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: operand last_19 = PHI ,
> type of def: reduction
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> note:   vect_is_simple_use: vectype vector(4) int
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:   multiple types in double reduction or condition reduction or
> fold-left reduction.
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1:
> missed:   not vectorized: relevant phi not supported: last_19 = PHI
> 
> /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21:
> missed:  bad operation or unsupported loop bound.
Hi Richard,
Looking at the aarch64 vect dump, it seems the loop in
condition_reduction gets vectorized with V4HI mode
while fails for other modes in vectorizable_condition:

  if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
  && ncopies > 1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "multiple types in double reduction or condition "
 "reduction or fold-left reduction.\n");
  return false;
}

>From the dump:
foo.c:9:21: note:   === vect_analyze_loop_operations ===
foo.c:9:21: note:   examining phi: last_19 = PHI 
foo.c:9:21: note:   vect_is_simple_use: operand (int) aval_13, type of
def: internal
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int
foo.c:9:21: note:   vect_is_simple_use: operand last_19 = PHI
, type of def: reduction
foo.c:9:21: note:   vect_is_simple_use: vectype vector(4) int

For V8HI, VF = 8, and vectype_in = vector(4) int.
Thus ncopies = VF / length(vectype_in) = 2, 

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-03 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 31 Jul 2023, Jeff Law wrote:
>
> >
> >
> > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:
> > > The following delays sinking of loads within the same innermost
> > > loop when it was unconditional before.  That's a not uncommon
> > > issue preventing vectorization when masked loads are not available.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > I have a followup patch improving sinking that without this would
> > > cause more of the problematic sinking - now that we have a second
> > > sink pass after loop opts this looks like a reasonable approach?
> > >
> > > OK?
> > >
> > > Thanks,
> > > Richard.
> > >
> > >  PR tree-optimization/92335
> > >  * tree-ssa-sink.cc (select_best_block): Before loop
> > >  optimizations avoid sinking unconditional loads/stores
> > >  in innermost loops to conditional executed places.
> > >
> > >  * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
> > >  * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
> > >  expect predictive commoning to happen instead of sinking.
> > >  * gcc.dg/vect/pr65947-3.c: Adjust.
> > I think it's reasonable -- there's probably going to be cases where it's not
> > great, but more often than not I think it's going to be a reasonable
> > heuristic.
> >
> > If there is undesirable fallout, better to find it over the coming months 
> > than
> > next spring.  So I'd suggest we go forward now to give more time to find any
> > pathological cases (if they exist).
>
> Agreed, I've pushed this now.
Hi Richard,
After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4d6f6a0),
pr65947-7.c "failed" for aarch64-linux-gnu:
FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects
scan-tree-dump-not vect "LOOP VECTORIZED"

/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target {
! vect_fold_extract_last } } } } */

With your commit, condition_reduction in pr65947-7.c gets vectorized
regardless of vect_fold_extract_last,
which gates the above test (which is an improvement, because the
function didn't get vectorized before the commit).

The attached patch thus removes the gating on vect_fold_extract_last,
and the test passes again.
OK to commit ?

Thanks,
Prathamesh
>
> Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
index 16cdcd1c6eb..7dabae81abf 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-7.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -52,5 +52,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-07-31 Thread Jeff Law via Gcc-patches




On 7/28/23 01:05, Richard Biener via Gcc-patches wrote:

The following delays sinking of loads within the same innermost
loop when it was unconditional before.  That's a not uncommon
issue preventing vectorization when masked loads are not available.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I have a followup patch improving sinking that without this would
cause more of the problematic sinking - now that we have a second
sink pass after loop opts this looks like a reasonable approach?

OK?

Thanks,
Richard.

PR tree-optimization/92335
* tree-ssa-sink.cc (select_best_block): Before loop
optimizations avoid sinking unconditional loads/stores
in innermost loops to conditional executed places.

* gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
* gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
expect predictive commoning to happen instead of sinking.
* gcc.dg/vect/pr65947-3.c: Adjust.
I think it's reasonable -- there's probably going to be cases where it's 
not great, but more often than not I think it's going to be a reasonable 
heuristic.


If there is undesirable fallout, better to find it over the coming 
months than next spring.  So I'd suggest we go forward now to give more 
time to find any pathological cases (if they exist).


Jeff