On Thu, May 16, 2024 at 11:19 PM Tamar Christina
<tamar.christ...@arm.com> wrote:
>
> Hi,
>
> > -----Original Message-----
> > From: Victor Do Nascimento <victor.donascime...@arm.com>
> > Sent: Thursday, May 16, 2024 2:57 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <richard.sandif...@arm.com>; Richard Earnshaw
> > <richard.earns...@arm.com>; Victor Do Nascimento
> > <victor.donascime...@arm.com>
> > Subject: [PATCH] middle-end: Drop __builtin_pretech calls in 
> > autovectorization
> > [PR114061]'
> >
> > At present the autovectorizer fails to vectorize simple loops
> > involving calls to `__builtin_prefetch'.  A simple example of such
> > loop is given below:
> >
> > void foo(double * restrict a, double * restrict b, int n){
> >   int i;
> >   for(i=0; i<n; ++i){
> >     a[i] = a[i] + b[i];
> >     __builtin_prefetch(&(b[i+8]));
> >   }
> > }
> >
> > The failure stems from two issues:
> >
> > 1. Given that it is typically not possible to fully reason about a
> >    function call due to the possibility of side effects, the
> >    autovectorizer does not attempt to vectorize loops which make such
> >    calls.
> >
> >    Given the memory reference passed to `__builtin_prefetch', in the
> >    absence of assurances about its effect on the passed memory
> >    location the compiler deems the function unsafe to vectorize,
> >    marking it as clobbering memory in `vect_find_stmt_data_reference'.
> >    This leads to the failure in autovectorization.
> >
> > 2. Notwithstanding the above issue, though the prefetch statement
> >    would be classed as `vect_unused_in_scope', the loop invariant that
> >    is used in the address of the prefetch is the scalar loop's and not
> >    the vector loop's IV. That is, it still uses `i' and not `vec_iv'
> >    because the instruction wasn't vectorized, causing DCE to think the
> >    value is live, such that we now have both the vector and scalar loop
> >    invariant actively used in the loop.
> >
> > This patch addresses both of these:
> >
> > 1. About the issue regarding the memory clobber, data prefetch does
> >    not generate faults if its address argument is invalid and does not
> >    write to memory.  Therefore, it does not alter the internal state
> >    of the program or its control flow under any circumstance.  As
> >    such, it is reasonable that the function be marked as not affecting
> >    memory contents.
> >
> >    To achieve this, we add the necessary logic to
> >    `get_references_in_stmt' to ensure that builtin functions are given
> >    given the same treatment as internal functions.  If the gimple call
> >    is to a builtin function and its function code is
> >    `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
> >
> > 2. Finding precedence in the way clobber statements are handled,
> >    whereby the vectorizer drops these from both the scalar and
> >    vectorized versions of a given loop, we choose to drop prefetch
> >    hints in a similar fashion.  This seems appropriate given how
> >    software prefetch hints are typically ignored by processors across
> >    architectures, as they seldom lead to performance gain over their
> >    hardware counterparts.
> >
> >    PR target/114061
> >
> > gcc/ChangeLog:
> >
> >       * tree-data-ref.cc (get_references_in_stmt): set
> >       `clobbers_memory' to false for __builtin_prefetch.
> >       * tree-vect-loop.cc (vect_transform_loop): Drop all
> >       __builtin_prefetch calls from loops.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/vect/vect-prefetch-drop.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++++++++++++++
> >  gcc/tree-data-ref.cc                           |  9 +++++++++
> >  gcc/tree-vect-loop.cc                          |  7 ++++++-
> >  3 files changed, 29 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> > b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> > new file mode 100644
> > index 00000000000..57723a8c972
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve -fdump-tree-
> > vect-details" { target { aarch64*-*-* } } } */
> > +
>
> See the review about two-way dotprod for comments on this.
> However this specific test does not need to check for any assembly 
> instructions.
>
> You're going from being unable to vectorize a function, to being able to 
> vectorize
> It.
>
> So the `vectorized 1 loops` check is sufficient, then this will work for all 
> targets.
> This requires a check on vect_double (see 
> gcc/testsuite/lib/target-supports.exp)
>
> I'd also change the loop to just use int, as more targets will support 
> vectorizing
> those, (and of course at a vect_int check instead)
>
> > +void foo(double * restrict a, double * restrict b, int n){
> > +  int i;
> > +  for(i=0; i<n; ++i){
> > +    a[i] = a[i] + b[i];
> > +    __builtin_prefetch(&(b[i+8]));
> > +  }
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "prfm" } } */
> > +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, 
> > z\[0-9\]+.d, z\[0-
> > 9\]+.d" } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> > diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> > index f37734b5340..47bfec0f922 100644
> > --- a/gcc/tree-data-ref.cc
> > +++ b/gcc/tree-data-ref.cc
> > @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
> > vec<data_ref_loc, va_heap> *references)
> >           clobbers_memory = true;
> >           break;
> >         }
> > +

also avoid extra vertical space here

> > +      else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> > +     {
> > +       enum built_in_function fn_type = DECL_FUNCTION_CODE
> > (TREE_OPERAND (gimple_call_fn (stmt), 0));
>
> auto fn_type = DECL_FUNCTION_CODE (gimple_call_fndecl (stmt));
> > +       if (fn_type == BUILT_IN_PREFETCH)
> > +         clobbers_memory = false;
> > +       else
> > +         clobbers_memory = true;
>
> clobbers_memory = fn_type != BUILT_IN_PREFETCH;

Or simplify it by doign

          else if (gimple_call_builtin_p (stmt, BUILT_IN_PRFETCH)
             clobbers_memory = false;

> > +     }
> >        else
> >       clobbers_memory = true;
> >      }
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..65e8b421d80 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call)
> >          !gsi_end_p (si);)
> >       {
> >         stmt = gsi_stmt (si);
> > -       /* During vectorization remove existing clobber stmts.  */
> > +       /* During vectorization remove existing clobber stmts and
> > +          prefetches.  */
> >         if (gimple_clobber_p (stmt))
> >           {
> >             unlink_stmt_vdef (stmt);
> >             gsi_remove (&si, true);
> >             release_defs (stmt);
> >           }
> > +       else if (gimple_call_builtin_p (stmt) &&

&& goes to the next line

> > +                DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn (stmt),
> > +                                                  0)) == BUILT_IN_PREFETCH)
> > +             gsi_remove (&si, true);
>
> Both unlink_stmt_vdef and release_defs are safe to call on this case, since 
> the function doesn't have a vdef it'll just exit early and since it doesn't 
> have an LHS release_defs should be a no-op.
>
> So how about just
>
> If (gimple_clobber_p (stmt)
>    || DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)) == BUILT_IN_PREFETCH)

But gimple_call_fndecl might be NULL.  You can use

  || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH)

though.

> Or maybe even
>
> If (gimple_clobber_p (stmt)
>     || fndecl_built_in_p (gimple_call_fndecl (stmt), BUILT_IN_PREFETCH))
>
> Thanks,
> Tamar
> >         else
> >           {
> >             /* Ignore vector stmts created in the outer loop.  */
> > --
> > 2.34.1
>

Reply via email to