On 09/11/15 16:35, Tom de Vries wrote:
Hi,
this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.
The patch series contains these patches:
1 Insert new exit block only when needed in
transform_to_exit_first_loop_alt
2 Make create_parallel_loop return void
3 Ignore reduction clause on kernels directive
4 Implement -foffload-alias
5 Add in_oacc_kernels_region in struct loop
6 Add pass_oacc_kernels
7 Add pass_dominator_oacc_kernels
8 Add pass_ch_oacc_kernels
9 Add pass_parallelize_loops_oacc_kernels
10 Add pass_oacc_kernels pass group in passes.def
11 Update testcases after adding kernels pass group
12 Handle acc loop directive
13 Add c-c++-common/goacc/kernels-*.c
14 Add gfortran.dg/goacc/kernels-*.f95
15 Add libgomp.oacc-c-c++-common/kernels-*.c
16 Add libgomp.oacc-fortran/kernels-*.f95
The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.
Bootstrapped and reg-tested on x86_64.
Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
I'll post the individual patches in reply to this message.
this patch addresses the problem that once the offloading region has
been split off from the original function, alias analysis can no longer
use information available in the original function that would allow it
to do a more precise analysis for the offloading function. [ At some
point we could use fipa-pta for that, as discussed in PR46032, but
that's not feasible now. ]
The basic idea behind the patch is that for typical usage, the base
pointers used in an offloaded region are non-aliasing. The patch works
by adding restrict to the types of the fields used to pass data to an
offloading region.
The patch implements a new option
-foffload-alias=<none|pointer|all>.
The option -foffload-alias=none instructs the compiler to assume that
object references and pointer dereferences in an offload region do not
alias.
The option -foffload-alias=pointer instructs the compiler to assume that
objects references in an offload region do not alias.
The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.
The default value is -foffload-alias=none.
Thanks,
- Tom
Implement -foffload-alias
2015-11-03 Tom de Vries <t...@codesourcery.com>
* common.opt (foffload-alias): New option.
* flag-types.h (enum offload_alias): New enum.
* omp-low.c (install_var_field): Handle flag_offload_alias.
* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
(@item -foffload-alias): New item.
* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---
gcc/common.opt | 16 ++++++
gcc/doc/invoke.texi | 11 ++++
gcc/flag-types.h | 7 +++
gcc/omp-low.c | 28 +++++++++-
.../goacc/kernels-loop-offload-alias-none.c | 61 ++++++++++++++++++++++
.../goacc/kernels-loop-offload-alias-ptr.c | 44 ++++++++++++++++
6 files changed, 165 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
diff --git a/gcc/common.opt b/gcc/common.opt
index 961a1b6..7135b1a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1735,6 +1735,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
EnumValue
Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
+foffload-alias=
+Common Joined RejectNegative Enum(offload_alias) Var(flag_offload_alias) Init(OFFLOAD_ALIAS_NONE)
+-foffload-alias=[all|pointer|none] Assume non-aliasing in an offload region
+
+Enum
+Name(offload_alias) Type(enum offload_alias) UnknownError(unknown offload aliasing %qs)
+
+EnumValue
+Enum(offload_alias) String(all) Value(OFFLOAD_ALIAS_ALL)
+
+EnumValue
+Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
+
+EnumValue
+Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
+
fomit-frame-pointer
Common Report Var(flag_omit_frame_pointer) Optimization
When possible do not generate stack frames.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2e5953b..6928efd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1143,6 +1143,7 @@ See S/390 and zSeries Options.
-finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
-finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
-fno-common -fno-ident @gol
+-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]} @gol
-fpcc-struct-return -fpic -fPIC -fpie -fPIE -fno-plt @gol
-fno-jump-tables @gol
-frecord-gcc-switches @gol
@@ -23852,6 +23853,16 @@ The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
results in @option{-ftrapv} being effective.
+@item -foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]}
+@opindex -foffload-alias
+The option @option{-foffload-alias=none} instructs the compiler to assume that
+objects references and pointer dereferences in an offload region do not alias.
+The option @option{-foffload-alias=pointer} instruct the compiler to assume that
+objects references in an offload region do not alias. The option
+@option{-foffload-alias=all} instructs the compiler to make no assumptions about
+aliasing in offload regions. The default value is
+@option{-foffload-alias=none}.
+
@item -fexceptions
@opindex fexceptions
Enable exception handling. Generates extra code needed to propagate
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 6301cea..87b1677 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -293,5 +293,12 @@ enum gfc_convert
GFC_FLAG_CONVERT_LITTLE
};
+enum offload_alias
+{
+ OFFLOAD_ALIAS_ALL,
+ OFFLOAD_ALIAS_POINTER,
+ OFFLOAD_ALIAS_NONE
+};
+
#endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 45d1927..d052c13 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1371,6 +1371,14 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
tree field, type, sfield = NULL_TREE;
splay_tree_key key = (splay_tree_key) var;
+ /* We use flag_offload_alias only for the oacc kernels region for the
+ moment. */
+ bool offload_alias_p = is_oacc_kernels (ctx);
+ bool no_alias_var_p
+ = offload_alias_p && flag_offload_alias != OFFLOAD_ALIAS_ALL;
+ bool no_alias_ptr_p
+ = offload_alias_p && flag_offload_alias == OFFLOAD_ALIAS_NONE;
+
if ((mask & 8) != 0)
{
key = (splay_tree_key) &DECL_UID (var);
@@ -1387,10 +1395,26 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
if (mask & 4)
{
gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
- type = build_pointer_type (build_pointer_type (type));
+
+ type = build_pointer_type (type);
+ if (no_alias_var_p)
+ type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+
+ type = build_pointer_type (type);
+ if (no_alias_var_p)
+ type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
}
else if (by_ref)
- type = build_pointer_type (type);
+ {
+ if (no_alias_ptr_p
+ && POINTER_TYPE_P (type))
+ type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+
+ type = build_pointer_type (type);
+
+ if (no_alias_var_p)
+ type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+ }
else if ((mask & 3) == 1 && is_reference (var))
type = TREE_TYPE (type);
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
new file mode 100644
index 0000000..79d8daa
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
@@ -0,0 +1,61 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+/* { dg-additional-options "-fdump-tree-alias-all" } */
+/* { dg-additional-options "-foffload-alias=none" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+static void
+foo (unsigned int *a, unsigned int *b, unsigned int *c)
+{
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+}
+
+int
+main (void)
+{
+ unsigned int *a;
+ unsigned int *b;
+ unsigned int *c;
+
+ a = (unsigned int *)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+ foo (a, b, c);
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+/* Check that the loop has been split off into a function. */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 3 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 6" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 7" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 9 "alias" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
new file mode 100644
index 0000000..de4f45a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
@@ -0,0 +1,44 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+/* { dg-additional-options "-fdump-tree-alias-all" } */
+/* { dg-additional-options "-foffload-alias=pointer" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+unsigned int a[N];
+unsigned int b[N];
+unsigned int c[N];
+
+int
+main (void)
+{
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ return 0;
+}
+
+/* Check that the loop has been split off into a function. */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 3 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 6 "alias" } } */
--
1.9.1