Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-23 Thread Andrew Stubbs

On 10/02/2023 09:11, Jakub Jelinek wrote:

I've tried to fix the -flto thing and I can't figure out how. The problem
seems to be that there are two dump files from the two compiler invocations
and it scans the wrong one. Aarch64 has the same problem.


Two dumps are because it is in a dg-do run test.
I think it would be better to separate it, have for all cases one
test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
compile:
# If the target system supports vector instructions, the default action
# for a test is 'run', otherwise it's 'compile'.
) without the dg-final and then another one with the same TYPE which would
be forcibly dg-do compile with dg-final and
dg-additional-options "-ffat-lto-objects", then you get a single dump only.


If I change the testcase to "dg-do compile" then it does indeed only 
produce one dump, but it's still the wrong one.


The command it runs is this (I removed some noise):

  x86_64-none-linux-gnu-gcc vect-simd-clone-16.c -flto -ffat-lto-objects \
  -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns \
  -fno-vect-cost-model -fno-common -O2 \
  -fdump-tree-vect-details -fopenmp-simd -mavx

With "-S" (dg-do compile) I get

  vect-simd-clone-16.c.172t.vect

Otherwise (dg-do run) I get

  a-vect-simd-clone-16.c.172t.vect
  a.ltrans0.ltrans.172t.vect

The "ltrans0" dump has the "foo.simdclone" output that we're looking 
for, but dejagnu appears to be scanning the other, which does not. The 
filenames vary between the two commands, but the contents is identical.



+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
amdgcn-*-* } } } */


And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
should be the right number for all of x86_64, amdgcn and aarch64.  And
please don't forget about i?86-*-* too.


I've switched the pattern and changed to using the "vect" dump (instead of
"optimized") so that the later transformations don't mess up the counts.
However there are still other reasons why the count varies. It might be that
those can be turned off by options somehow, but probably testing those cases
is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
improvement.


But still varries between the architectures, so it is an extra maintainance
nightmare.


+/* TODO: aarch64 */


For aarch64, one would need to include it in 
check_effective_target_vect_simd_clones
first...


I've done so and tested it, but that's not included in the patch because
there were other testcases that started reporting fails. None of the new
testcases fail for Aarch64.


Sure, that would be for a separate patch.

Anyway, if you want, commit the patch as is and tweak the testcases if
possible incrementally.


I will do so now. It would be nice to fix the testcase oddities, but I 
don't know how.


I wrote the above yesterday, but apparently the email didn't send ... 
since then some bugs have been reported. I'll try to investigate today, 
although I think Richi has a fix already.


Thanks

Andrew


Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 06, 2023 at 12:20:33PM +, Andrew Stubbs wrote:
> > > +/* Ensure the the in-branch simd clones are used on targets that support
> > > +   them.  These counts include all call and definitions.  */
> > > +
> > > +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
> > 
> > Drop lines line above.
> 
> I don't want to drop the comment because I get so frustrated by testcases
> that fail when something changes and it's not obvious what the original
> author was actually trying to test.
> 
> I've tried to fix the -flto thing and I can't figure out how. The problem
> seems to be that there are two dump files from the two compiler invocations
> and it scans the wrong one. Aarch64 has the same problem.

Two dumps are because it is in a dg-do run test.
I think it would be better to separate it, have for all cases one
test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
compile:
# If the target system supports vector instructions, the default action
# for a test is 'run', otherwise it's 'compile'.
) without the dg-final and then another one with the same TYPE which would
be forcibly dg-do compile with dg-final and
dg-additional-options "-ffat-lto-objects", then you get a single dump only.

> > > +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
> > > x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
> > > amdgcn-*-* } } } */
> > 
> > And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
> > should be the right number for all of x86_64, amdgcn and aarch64.  And
> > please don't forget about i?86-*-* too.
> 
> I've switched the pattern and changed to using the "vect" dump (instead of
> "optimized") so that the later transformations don't mess up the counts.
> However there are still other reasons why the count varies. It might be that
> those can be turned off by options somehow, but probably testing those cases
> is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
> improvement.

But still varries between the architectures, so it is an extra maintainance
nightmare.

> > > +/* TODO: aarch64 */
> > 
> > For aarch64, one would need to include it in 
> > check_effective_target_vect_simd_clones
> > first...
> 
> I've done so and tested it, but that's not included in the patch because
> there were other testcases that started reporting fails. None of the new
> testcases fail for Aarch64.

Sure, that would be for a separate patch.

Anyway, if you want, commit the patch as is and tweak the testcases if
possible incrementally.

Jakub



Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-01-06 Thread Andrew Stubbs

Here's a new version of the patch.

On 01/12/2022 14:16, Jakub Jelinek wrote:

+void __attribute__((noinline))


You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so that you don't
get surprises e.g. from extra ipa cp etc.


Fixed.


+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */


Drop lines line above.


I don't want to drop the comment because I get so frustrated by 
testcases that fail when something changes and it's not obvious what the 
original author was actually trying to test.


I've tried to fix the -flto thing and I can't figure out how. The 
problem seems to be that there are two dump files from the two compiler 
invocations and it scans the wrong one. Aarch64 has the same problem.



+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
amdgcn-*-* } } } */


And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
should be the right number for all of x86_64, amdgcn and aarch64.  And
please don't forget about i?86-*-* too.


I've switched the pattern and changed to using the "vect" dump (instead 
of "optimized") so that the later transformations don't mess up the 
counts. However there are still other reasons why the count varies. It 
might be that those can be turned off by options somehow, but probably 
testing those cases is valuable too. The values are 2, 3, or 4, now, 
instead of 18, so that's an improvement.





+/* TODO: aarch64 */


For aarch64, one would need to include it in 
check_effective_target_vect_simd_clones
first...


I've done so and tested it, but that's not included in the patch because 
there were other testcases that started reporting fails. None of the new 
testcases fail for Aarch64.


OK now?

Andrewvect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191..d9e11bfc62a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4527,3 +4527,10 @@ void
 expand_ASSUME (internal_fn, gcall *)
 {
 }
+
+void
+expand_MASK_CALL (internal_fn, gcall *)
+{
+  /* This IFN should only exist between ifcvt and vect passes.  */
+  gcc_unreachable ();
+}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66d..301c3780659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -466,6 +466,9 @@ DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
 DEF_INTERNAL_FN (ASSUME, ECF_CONST | ECF_LEAF | ECF_NOTHROW
 | ECF_LOOPING_CONST_OR_PURE, NULL)
 
+/* For if-conversion 

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-12-01 Thread Jakub Jelinek via Gcc-patches
On Thu, Dec 01, 2022 at 01:35:38PM +, Andrew Stubbs wrote:
> > Maybe better add -ffat-lto-objects to dg-additional-options and drop
> > the dg-skip-if (if it works with that, for all similar tests)?
> 
> The tests are already run with -ffat-lto-objects and the test still fails
> (pattern found zero times). I don't know why.
> 
> Aside from that, I've made all the other changes you requested.

Ah, I see what's going on.  You match simdclone, which isn't matched just in
the calls (I bet that is what you actually should/want count), but also twice
per each simd clone definition (and if somebody has say path to gcc
tree with simdclone in the name could match even more times).

Thus, I think:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
> @@ -0,0 +1,89 @@
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +/* Test that simd inbranch clones work correctly.  */
> +
> +#ifndef TYPE
> +#define TYPE int
> +#endif
> +
> +/* A simple function that will be cloned.  */
> +#pragma omp declare simd
> +TYPE __attribute__((noinline))
> +foo (TYPE a)
> +{
> +  return a + 1;
> +}
> +
> +/* Check that "inbranch" clones are called correctly.  */
> +
> +void __attribute__((noinline))

You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so that you don't
get surprises e.g. from extra ipa cp etc.

> +masked (TYPE * __restrict a, TYPE * __restrict b, int size)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < size; i++)
> +b[i] = a[i]<1 ? foo(a[i]) : a[i];
> +}
> +
> +/* Check that "inbranch" works when there might be unrolling.  */
> +
> +void __attribute__((noinline))

So here too.
> +masked_fixed (TYPE * __restrict a, TYPE * __restrict b)

> +/* Ensure the the in-branch simd clones are used on targets that support
> +   them.  These counts include all call and definitions.  */
> +
> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */

Drop lines line above.

> +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
> x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
> amdgcn-*-* } } } */

And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
should be the right number for all of x86_64, amdgcn and aarch64.  And
please don't forget about i?86-*-* too.

> +/* TODO: aarch64 */

For aarch64, one would need to include it in 
check_effective_target_vect_simd_clones
first...

Otherwise LGTM.

Jakub



Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-12-01 Thread Andrew Stubbs

On 30/11/2022 15:37, Jakub Jelinek wrote:

On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote:

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */

...

+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */


Maybe better add -ffat-lto-objects to dg-additional-options and drop
the dg-skip-if (if it works with that, for all similar tests)?


The tests are already run with -ffat-lto-objects and the test still 
fails (pattern found zero times). I don't know why.


Aside from that, I've made all the other changes you requested.

OK now?

Andrewvect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191..d9e11bfc62a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4527,3 +4527,10 @@ void
 expand_ASSUME (internal_fn, gcall *)
 {
 }
+
+void
+expand_MASK_CALL (internal_fn, gcall *)
+{
+  /* This IFN should only exist between ifcvt and vect passes.  */
+  gcc_unreachable ();
+}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66d..301c3780659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -466,6 +466,9 @@ DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
 DEF_INTERNAL_FN (ASSUME, ECF_CONST | ECF_LEAF | ECF_NOTHROW
 | ECF_LOOPING_CONST_OR_PURE, NULL)
 
+/* For if-conversion of inbranch SIMD clones.  */
+DEF_INTERNAL_FN (MASK_CALL, ECF_NOVOPS, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 21b1ce43df6..ced92c041bb 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -244,6 +244,7 @@ extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
 extern void expand_SPACESHIP (internal_fn, gcall *);
 extern void expand_TRAP (internal_fn, gcall *);
 extern void expand_ASSUME (internal_fn, gcall *);
+extern void expand_MASK_CALL (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 21d69aa8747..afb7d99747b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -937,6 +937,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
}
   sc->args[i].orig_type = base_type;
   sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+  sc->args[i].vector_type = adj.type;
 }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c 

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-11-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
> @@ -0,0 +1,89 @@
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
...
> +/* Ensure the the in-branch simd clones are used on targets that support
> +   them.  These counts include all call and definitions.  */
> +
> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */

Maybe better add -ffat-lto-objects to dg-additional-options and drop
the dg-skip-if (if it works with that, for all similar tests)?

> @@ -1063,7 +1064,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
> A statement is if-convertible if:
> - it is an if-convertible GIMPLE_ASSIGN,
> - it is a GIMPLE_LABEL or a GIMPLE_COND,
> -   - it is builtins call.  */
> +   - it is builtins call.

s/call\./call,/ above

> +   - it is a call to a function with a SIMD clone.  */
>  
>  static bool
>  if_convertible_stmt_p (gimple *stmt, vec refs)
> @@ -1083,13 +1085,23 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>   tree fndecl = gimple_call_fndecl (stmt);
>   if (fndecl)
> {
> + /* We can vectorize some builtins and functions with SIMD
> +"inbranch" clones.  */
>   int flags = gimple_call_flags (stmt);
> + struct cgraph_node *node = cgraph_node::get (fndecl);
>   if ((flags & ECF_CONST)
>   && !(flags & ECF_LOOPING_CONST_OR_PURE)
> - /* We can only vectorize some builtins at the moment,
> -so restrict if-conversion to those.  */
>   && fndecl_built_in_p (fndecl))
> return true;
> + else if (node && node->simd_clones != NULL)

I don't see much value in the "else " above, the if branch returns
if condition is true, so just
if (node && node->simd_clones != NULL)
would do it.

> +   /* Ensure that at least one clone can be "inbranch".  */
> +   for (struct cgraph_node *n = node->simd_clones; n != NULL;
> +n = n->simdclone->next_clone)
> + if (n->simdclone->inbranch)
> +   {
> + need_to_predicate = true;
> + return true;
> +   }
> }
>   return false;
>}
> @@ -2603,6 +2615,29 @@ predicate_statements (loop_p loop)
> gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
> update_stmt (stmt);
>   }
> +
> +   /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
> +  will cause the vectorizer to match the "in branch" clone variants,
> +  and serves to build the mask vector in a natural way.  */
> +   gcall *call = dyn_cast  (gsi_stmt (gsi));
> +   if (call && !gimple_call_internal_p (call))
> + {
> +   tree orig_fn = gimple_call_fn (call);
> +   int orig_nargs = gimple_call_num_args (call);
> +   auto_vec args;
> +   args.safe_push (orig_fn);
> +   for (int i=0; i < orig_nargs; i++)

Formatting - int i = 0;

> + args.safe_push (gimple_call_arg (call, i));
> +   args.safe_push (cond);
> +
> +   /* Replace the call with a IFN_MASK_CALL that has the extra
> +  condition parameter. */
> +   gcall *new_call = gimple_build_call_internal_vec (IFN_MASK_CALL,
> + args);
> +   gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> +   gsi_replace (, new_call, true);
> + }
> +
> lhs = gimple_get_lhs (gsi_stmt (gsi));
> if (lhs && TREE_CODE (lhs) == SSA_NAME)
>   ssa_names.add (lhs);
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3987,6 +3987,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>size_t i, nargs;
>tree lhs, rtype, ratype;
>vec *ret_ctor_elts = NULL;
> +  int arg_offset = 0;
>  
>/* Is STMT a vectorizable call?   */
>gcall *stmt = dyn_cast  (stmt_info->stmt);
> @@ -3994,6 +3995,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  return false;
>  
>fndecl = gimple_call_fndecl (stmt);
> +  if (fndecl == NULL_TREE
> +  && gimple_call_internal_p (stmt)
> +  && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)

Replace the above 2 lines with
  && gimple_call_internal_p (stmt, IFN_MASK_CALL))
?

> --- a/gcc/tree-vect-loop.cc   
>   
>   
> +++ b/gcc/tree-vect-loop.cc   
>   
>   
> @@ -2121,6 +2121,15 @@ 

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-11-30 Thread Andrew Stubbs

On 09/09/2022 15:31, Jakub Jelinek wrote:

--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl)
  {
+   /* We can vectorize some builtins and functions with SIMD
+  clones.  */
int flags = gimple_call_flags (stmt);
+   struct cgraph_node *node = cgraph_node::get (fndecl);
if ((flags & ECF_CONST)
&& !(flags & ECF_LOOPING_CONST_OR_PURE)
-   /* We can only vectorize some builtins at the moment,
-  so restrict if-conversion to those.  */
&& fndecl_built_in_p (fndecl))
  return true;
+   else if (node && node->simd_clones != NULL)
+ {
+   need_to_predicate = true;


I think it would be worth it to check that at least one of the
node->simd_clones clones has ->inbranch set, because if all calls
are declare simd notinbranch, then predicating the loop will be just a
wasted effort.


+   return true;
+ }
  }
return false;
}
@@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
  gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
  update_stmt (stmt);
}
+
+ /* Add a predicate parameter to functions that have a SIMD clone.
+This will cause the vectorizer to match the "in branch" clone
+variants because they also have the extra parameter, and serves
+to build the mask vector in a natural way.  */
+ gcall *call = dyn_cast  (gsi_stmt (gsi));
+ if (call && !gimple_call_internal_p (call))
+   {
+ tree orig_fndecl = gimple_call_fndecl (call);
+ int orig_nargs = gimple_call_num_args (call);
+ auto_vec args;
+ for (int i=0; i < orig_nargs; i++)
+   args.safe_push (gimple_call_arg (call, i));
+ args.safe_push (cond);
+
+ /* Replace the call with a new one that has the extra
+parameter.  The FUNCTION_DECL remains unchanged so that
+the vectorizer can find the SIMD clones.  This call will
+either be deleted or replaced at that time, so the
+mismatch is short-lived and we can live with it.  */
+ gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
+ gimple_call_set_lhs (new_call, gimple_call_lhs (call));
+ gsi_replace (, new_call, true);


I think this is way too dangerous to represent conditional calls that way,
there is nothing to distinguish those from non-conditional calls.
I think I'd prefer (but please see what Richi thinks too) to represent
the conditional calls as a call to a new internal function, say
IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
call had, plus 2 extra ones first (or 3?), one that would be saved copy of
original gimple_call_fn (i.e. usually ), another one that would be the
condition (and dunno about whether we need also something to represent
gimple_call_fntype, or whether we simply should punt during ifcvt
on conditional calls where gimple_call_fntype is incompatible with
the function type of fndecl.  Another question is about
gimple_call_chain.  Punt or copy it over to the ifn and back.


The attached should resolve these issues.

OK for mainline?

Andrewvect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* 

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-09-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 14, 2022 at 08:09:08AM +, Richard Biener wrote:
> Are nested functions a thing for OpenMP?  But yes, punt on them
> for now.

For Fortran certainly because they are part of the language, for C
too because they are GNU extension.
But declare simd is mostly best effort, so we can at least for now punt.

> I agree that a conditional call should be explicit, but the above is
> only transitional between if-conversion and vectorization, right?
> Do we support indirect calls here?  As Jakub says one possibility
> is to do
> 
>  .IFN_COND/MASK_CALL (fn-addr, condition/mask, ...)
> 
> another would be
> 
>  fnptr = cond ? fn : _call;
>  (*fnptr) (...);
> 
> thus replace the called function with conditional "nop".  How
> to exactly represent that NOP probably isn't too important
> when it's transitional until vectorization only, even NULL
> might work there.  Of course having the function address
> in a computation might confuse parts of the vectorizer.  OTOH
> it allows to keep the original call (and the chain and any
> other info we'd have to preserve otherwise).

On the ifn one can preserve those too and the advantage is that
it would be just one command rather than 2, but I'm not opposed
to the other way either.

Jakub



Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-09-14 Thread Richard Biener via Gcc-patches
On Fri, 9 Sep 2022, Jakub Jelinek wrote:

> On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
> > 
> > There has been support for generating "inbranch" SIMD clones for a long 
> > time,
> > but nothing actually uses them (as far as I can see).
> 
> Thanks for working on this.
> 
> Note, there is another case where the inbranch SIMD clones could be used
> and I even thought it is implemented, but apparently it isn't or it doesn't
> work:
> #ifndef TYPE
> #define TYPE int
> #endif
> 
> /* A simple function that will be cloned.  */
> #pragma omp declare simd inbranch
> TYPE __attribute__((noinline))
> foo (TYPE a)
> {
>   return a + 1;
> }
> 
> /* Check that "inbranch" clones are called correctly.  */
> 
> void __attribute__((noinline))
> masked (TYPE * __restrict a, TYPE * __restrict b, int size)
> {
>   #pragma omp simd
>   for (int i = 0; i < size; i++)
> b[i] = foo(a[i]);
> }
> 
> Here, IMHO we should use the inbranch clone for vectorization (better
> than not vectorizing it, worse than when we'd have a notinbranch clone)
> and just use mask of all ones.
> But sure, it can be done incrementally, just mentioning it for completeness.
> 
> > This patch add supports for a sub-set of possible cases (those using
> > mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
> > so there should be no regressions.
> > 
> > The sub-set of support should cover all cases needed by amdgcn, at present.
> > 
> > gcc/ChangeLog:
> > 
> > * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
> > for mask arguments also.
> > * tree-if-conv.cc: Include cgraph.h.
> > (if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
> > (predicate_statements): Pass the predicate to SIMD functions.
> > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
> > to clones with mask arguments, in some cases.
> > Generate the mask vector arguments.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/vect/vect-simd-clone-16.c: New test.
> > * gcc.dg/vect/vect-simd-clone-16b.c: New test.
> > * gcc.dg/vect/vect-simd-clone-16c.c: New test.
> > * gcc.dg/vect/vect-simd-clone-16d.c: New test.
> > * gcc.dg/vect/vect-simd-clone-16e.c: New test.
> > * gcc.dg/vect/vect-simd-clone-16f.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17b.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17c.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17d.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17e.c: New test.
> > * gcc.dg/vect/vect-simd-clone-17f.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18b.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18c.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18d.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18e.c: New test.
> > * gcc.dg/vect/vect-simd-clone-18f.c: New test.
> 
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, 
> > vec refs)
> > tree fndecl = gimple_call_fndecl (stmt);
> > if (fndecl)
> >   {
> > +   /* We can vectorize some builtins and functions with SIMD
> > +  clones.  */
> > int flags = gimple_call_flags (stmt);
> > +   struct cgraph_node *node = cgraph_node::get (fndecl);
> > if ((flags & ECF_CONST)
> > && !(flags & ECF_LOOPING_CONST_OR_PURE)
> > -   /* We can only vectorize some builtins at the moment,
> > -  so restrict if-conversion to those.  */
> > && fndecl_built_in_p (fndecl))
> >   return true;
> > +   else if (node && node->simd_clones != NULL)
> > + {
> > +   need_to_predicate = true;
> 
> I think it would be worth it to check that at least one of the
> node->simd_clones clones has ->inbranch set, because if all calls
> are declare simd notinbranch, then predicating the loop will be just a
> wasted effort.
> 
> > +   return true;
> > + }
> >   }
> > return false;
> >}
> > @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
> >   gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
> >   update_stmt (stmt);
> > }
> > +
> > + /* Add a predicate parameter to functions that have a SIMD clone.
> > +This will cause the vectorizer to match the "in branch" clone
> > +variants because they also have the extra parameter, and serves
> > +to build the mask vector in a natural way.  */
> > + gcall *call = dyn_cast  (gsi_stmt (gsi));
> > + if (call && !gimple_call_internal_p (call))
> > +   {
> > + tree orig_fndecl = gimple_call_fndecl (call);
> > + int orig_nargs = gimple_call_num_args (call);
> > + auto_vec args;
> > + for (int i=0; i < orig_nargs; i++)
> > +   args.safe_push (gimple_call_arg (call, 

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-09-09 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
> 
> There has been support for generating "inbranch" SIMD clones for a long time,
> but nothing actually uses them (as far as I can see).

Thanks for working on this.

Note, there is another case where the inbranch SIMD clones could be used
and I even thought it is implemented, but apparently it isn't or it doesn't
work:
#ifndef TYPE
#define TYPE int
#endif

/* A simple function that will be cloned.  */
#pragma omp declare simd inbranch
TYPE __attribute__((noinline))
foo (TYPE a)
{
  return a + 1;
}

/* Check that "inbranch" clones are called correctly.  */

void __attribute__((noinline))
masked (TYPE * __restrict a, TYPE * __restrict b, int size)
{
  #pragma omp simd
  for (int i = 0; i < size; i++)
b[i] = foo(a[i]);
}

Here, IMHO we should use the inbranch clone for vectorization (better
than not vectorizing it, worse than when we'd have a notinbranch clone)
and just use mask of all ones.
But sure, it can be done incrementally, just mentioning it for completeness.

> This patch add supports for a sub-set of possible cases (those using
> mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
> so there should be no regressions.
> 
> The sub-set of support should cover all cases needed by amdgcn, at present.
> 
> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
>   for mask arguments also.
>   * tree-if-conv.cc: Include cgraph.h.
>   (if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
>   (predicate_statements): Pass the predicate to SIMD functions.
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
>   to clones with mask arguments, in some cases.
>   Generate the mask vector arguments.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-simd-clone-16.c: New test.
>   * gcc.dg/vect/vect-simd-clone-16b.c: New test.
>   * gcc.dg/vect/vect-simd-clone-16c.c: New test.
>   * gcc.dg/vect/vect-simd-clone-16d.c: New test.
>   * gcc.dg/vect/vect-simd-clone-16e.c: New test.
>   * gcc.dg/vect/vect-simd-clone-16f.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17b.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17c.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17d.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17e.c: New test.
>   * gcc.dg/vect/vect-simd-clone-17f.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18b.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18c.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18d.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18e.c: New test.
>   * gcc.dg/vect/vect-simd-clone-18f.c: New test.

> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>   tree fndecl = gimple_call_fndecl (stmt);
>   if (fndecl)
> {
> + /* We can vectorize some builtins and functions with SIMD
> +clones.  */
>   int flags = gimple_call_flags (stmt);
> + struct cgraph_node *node = cgraph_node::get (fndecl);
>   if ((flags & ECF_CONST)
>   && !(flags & ECF_LOOPING_CONST_OR_PURE)
> - /* We can only vectorize some builtins at the moment,
> -so restrict if-conversion to those.  */
>   && fndecl_built_in_p (fndecl))
> return true;
> + else if (node && node->simd_clones != NULL)
> +   {
> + need_to_predicate = true;

I think it would be worth it to check that at least one of the
node->simd_clones clones has ->inbranch set, because if all calls
are declare simd notinbranch, then predicating the loop will be just a
wasted effort.

> + return true;
> +   }
> }
>   return false;
>}
> @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
> gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
> update_stmt (stmt);
>   }
> +
> +   /* Add a predicate parameter to functions that have a SIMD clone.
> +  This will cause the vectorizer to match the "in branch" clone
> +  variants because they also have the extra parameter, and serves
> +  to build the mask vector in a natural way.  */
> +   gcall *call = dyn_cast  (gsi_stmt (gsi));
> +   if (call && !gimple_call_internal_p (call))
> + {
> +   tree orig_fndecl = gimple_call_fndecl (call);
> +   int orig_nargs = gimple_call_num_args (call);
> +   auto_vec args;
> +   for (int i=0; i < orig_nargs; i++)
> + args.safe_push (gimple_call_arg (call, i));
> +   args.safe_push (cond);
> +
> +   /* Replace the call with a new one that has the extra
> +  parameter.  The FUNCTION_DECL remains 

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Pass the predicate to SIMD functions.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
to clones with mask arguments, in some cases.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.
---
 gcc/omp-simd-clone.cc |   1 +
 .../gcc.dg/vect/vect-simd-clone-16.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-16b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-17b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-18b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18f.c |  16 +++
 gcc/tree-if-conv.cc   |  39 -
 gcc/tree-vect-stmts.cc| 134 ++
 21 files changed, 641 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 258d3c6377f..58e3dc8b2e9 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -716,6 +716,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
   sc->args[i].orig_type = base_type;
   sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+  sc->args[i].vector_type = adj.type;
 }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 000..ffaabb30d1e
--- /dev/null