On Thu, May 16, 2024 at 11:19 PM Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi, > > > -----Original Message----- > > From: Victor Do Nascimento <victor.donascime...@arm.com> > > Sent: Thursday, May 16, 2024 2:57 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Richard Earnshaw > > <richard.earns...@arm.com>; Victor Do Nascimento > > <victor.donascime...@arm.com> > > Subject: [PATCH] middle-end: Drop __builtin_pretech calls in > > autovectorization > > [PR114061]' > > > > At present the autovectorizer fails to vectorize simple loops > > involving calls to `__builtin_prefetch'. A simple example of such > > loop is given below: > > > > void foo(double * restrict a, double * restrict b, int n){ > > int i; > > for(i=0; i<n; ++i){ > > a[i] = a[i] + b[i]; > > __builtin_prefetch(&(b[i+8])); > > } > > } > > > > The failure stems from two issues: > > > > 1. Given that it is typically not possible to fully reason about a > > function call due to the possibility of side effects, the > > autovectorizer does not attempt to vectorize loops which make such > > calls. > > > > Given the memory reference passed to `__builtin_prefetch', in the > > absence of assurances about its effect on the passed memory > > location the compiler deems the function unsafe to vectorize, > > marking it as clobbering memory in `vect_find_stmt_data_reference'. > > This leads to the failure in autovectorization. > > > > 2. Notwithstanding the above issue, though the prefetch statement > > would be classed as `vect_unused_in_scope', the loop invariant that > > is used in the address of the prefetch is the scalar loop's and not > > the vector loop's IV. That is, it still uses `i' and not `vec_iv' > > because the instruction wasn't vectorized, causing DCE to think the > > value is live, such that we now have both the vector and scalar loop > > invariant actively used in the loop. > > > > This patch addresses both of these: > > > > 1. About the issue regarding the memory clobber, data prefetch does > > not generate faults if its address argument is invalid and does not > > write to memory. Therefore, it does not alter the internal state > > of the program or its control flow under any circumstance. As > > such, it is reasonable that the function be marked as not affecting > > memory contents. > > > > To achieve this, we add the necessary logic to > > `get_references_in_stmt' to ensure that builtin functions are given > > given the same treatment as internal functions. If the gimple call > > is to a builtin function and its function code is > > `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false. > > > > 2. Finding precedence in the way clobber statements are handled, > > whereby the vectorizer drops these from both the scalar and > > vectorized versions of a given loop, we choose to drop prefetch > > hints in a similar fashion. This seems appropriate given how > > software prefetch hints are typically ignored by processors across > > architectures, as they seldom lead to performance gain over their > > hardware counterparts. > > > > PR target/114061 > > > > gcc/ChangeLog: > > > > * tree-data-ref.cc (get_references_in_stmt): set > > `clobbers_memory' to false for __builtin_prefetch. > > * tree-vect-loop.cc (vect_transform_loop): Drop all > > __builtin_prefetch calls from loops. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-prefetch-drop.c: New test. > > --- > > gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++++++++++++++ > > gcc/tree-data-ref.cc | 9 +++++++++ > > gcc/tree-vect-loop.cc | 7 ++++++- > > 3 files changed, 29 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > > b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > > new file mode 100644 > > index 00000000000..57723a8c972 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > > @@ -0,0 +1,14 @@ > > +/* { dg-do compile { target { aarch64*-*-* } } } */ > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve -fdump-tree- > > vect-details" { target { aarch64*-*-* } } } */ > > + > > See the review about two-way dotprod for comments on this. > However this specific test does not need to check for any assembly > instructions. > > You're going from being unable to vectorize a function, to being able to > vectorize > It. > > So the `vectorized 1 loops` check is sufficient, then this will work for all > targets. > This requires a check on vect_double (see > gcc/testsuite/lib/target-supports.exp) > > I'd also change the loop to just use int, as more targets will support > vectorizing > those, (and of course at a vect_int check instead) > > > +void foo(double * restrict a, double * restrict b, int n){ > > + int i; > > + for(i=0; i<n; ++i){ > > + a[i] = a[i] + b[i]; > > + __builtin_prefetch(&(b[i+8])); > > + } > > +} > > + > > +/* { dg-final { scan-assembler-not "prfm" } } */ > > +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, > > z\[0-9\]+.d, z\[0- > > 9\]+.d" } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > > diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc > > index f37734b5340..47bfec0f922 100644 > > --- a/gcc/tree-data-ref.cc > > +++ b/gcc/tree-data-ref.cc > > @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt, > > vec<data_ref_loc, va_heap> *references) > > clobbers_memory = true; > > break; > > } > > +
also avoid extra vertical space here > > + else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) > > + { > > + enum built_in_function fn_type = DECL_FUNCTION_CODE > > (TREE_OPERAND (gimple_call_fn (stmt), 0)); > > auto fn_type = DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)); > > + if (fn_type == BUILT_IN_PREFETCH) > > + clobbers_memory = false; > > + else > > + clobbers_memory = true; > > clobbers_memory = fn_type != BUILT_IN_PREFETCH; Or simplify it by doign else if (gimple_call_builtin_p (stmt, BUILT_IN_PRFETCH) clobbers_memory = false; > > + } > > else > > clobbers_memory = true; > > } > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 361aec06488..65e8b421d80 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info loop_vinfo, > > gimple *loop_vectorized_call) > > !gsi_end_p (si);) > > { > > stmt = gsi_stmt (si); > > - /* During vectorization remove existing clobber stmts. */ > > + /* During vectorization remove existing clobber stmts and > > + prefetches. */ > > if (gimple_clobber_p (stmt)) > > { > > unlink_stmt_vdef (stmt); > > gsi_remove (&si, true); > > release_defs (stmt); > > } > > + else if (gimple_call_builtin_p (stmt) && && goes to the next line > > + DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn (stmt), > > + 0)) == BUILT_IN_PREFETCH) > > + gsi_remove (&si, true); > > Both unlink_stmt_vdef and release_defs are safe to call on this case, since > the function doesn't have a vdef it'll just exit early and since it doesn't > have an LHS release_defs should be a no-op. > > So how about just > > If (gimple_clobber_p (stmt) > || DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)) == BUILT_IN_PREFETCH) But gimple_call_fndecl might be NULL. You can use || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH) though. > Or maybe even > > If (gimple_clobber_p (stmt) > || fndecl_built_in_p (gimple_call_fndecl (stmt), BUILT_IN_PREFETCH)) > > Thanks, > Tamar > > else > > { > > /* Ignore vector stmts created in the outer loop. */ > > -- > > 2.34.1 >