[OG14] Fortran/OpenMP: Support mapping of DT with allocatable components: disable 'generate_callback_wrapper' for nvptx target (was: [Patch][Stage 1] Fortran/OpenMP: Support mapping of DT with allocat

2024-07-03 Thread Thomas Schwinge
Hi Tobias!

I've compared test results for nvptx target for GCC 14 vs. the new OG14,
and ran into a number of unexpected regressions: thousands of compilation
PASS -> FAIL in the Fortran testsuite.  The few that I looked at were all
like:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
compiler exited with status 1

Comparing '-fdump-tree-all' for 'gfortran.dg/pr37287-1.f90' (randomly
picked) for GCC 14 vs. OG14, already in 'pr37287-1.f90.005t.original' we
see:

--- [GCC 14]/pr37287-1.f90.005t.original  2024-07-03 12:45:08.369948469 
+0200
+++ [OG14]/pr37287-1.f90.005t.original   2024-07-03 12:44:57.770072298 
+0200
@@ -1,3 +1,21 @@
+__attribute__((fn spec (". r r r r ")))
+integer(kind=8) __callback___iso_c_binding_C_ptr (integer(kind=8) 
(*) (void *, void * & restrict, integer(kind=2), void (*) (void)) 
cb, void * token, void * this_ptr, integer(kind=2) flag)
+{
+  integer(kind=8) result;
+  void * * scalar;
+
+  result = 0;
+  if (flag == 1)
+{
+  result = cb (token, _ptr, 64, 3, 0B);
+  return result;
+}
+  L$1:;
+  scalar = (void * *) this_ptr;
+  return result;
+}
+
+
 __attribute__((fn spec (". . . ")))
 void __copy___iso_c_binding_C_ptr (void * & restrict src, void * & 
restrict dst)
 {

(In addition to the whole function '__callback___iso_c_binding_C_ptr',
also note that the 'L$1:' label and 'scalar' variable are dead here; but
that's likely unrelated to the issue at hand?)

This points to OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components":

On 2022-03-01T16:34:18+0100, Tobias Burnus  wrote:
> this patch adds support for mapping something like
>type t
>  type(t2), allocatable :: a, b(:)
>  integer, allocatable :: c, c(:)
>end type t
>type(t), allocatable :: var, var2(:,:)
>
>!$omp target enter data map(var, var)
>
> which does a deep walk of the components at runtime.
>
> [...]
>
> Issues: None known, but I am sure with experimenting,
> more can be found - [...]

Due to a number of other commits (at least textually) depending on this
one, this commit isn't easy to revert on OG14.

But: if I disable it for nvptx target as per the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target",
then we're back to good -- all GCC 14 vs. OG14 regressions resolved for
nvptx target.

By the way: it's possible that we've had the same misbehavior also on
OG13 and earlier, but just nobody ever tested that for nvptx target.

Note that also outside of OG14 (that is, in GCC 14 as well as GCC trunk),
we have a number of instances of:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'

... all over the Fortran test suite (only).  My current theory therefore
is that there is some latent issue, which is just greatly exacerbated by
OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components" (or
some related change).

This could be the Fortran front end generating incorrect GIMPLE, or the
middle end or (more likely?) nvptx back end not correctly handling
something that only comes into existance via the Fortran front end.

Anyway: until we understand the underlying issue, OK to push the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target"
to devel/omp/gcc-14 branch?


Grüße
 Thomas


>From 3fb9e4cabea736ace66ee197be1b13a978af10ac Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 3 Jul 2024 22:09:39 +0200
Subject: [PATCH] Fortran/OpenMP: Support mapping of DT with allocatable
 components: disable 'generate_callback_wrapper' for nvptx target

This is, obviously, not the final fix for this issue.

	gcc/fortran/
	* class.cc (generate_callback_wrapper) [GCC_NVPTX_H]: Disable.
---
 gcc/fortran/class.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 15aacd98fd8..2c062204e5a 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gfortran.h"
 #include "constructor.h"
 #include "target-memory.h"
+#include "tm.h" //TODO
 
 /* Inserts a derived type component reference in a data reference chain.
 TS: base type of the ref chain so far, in which we will pick the c

nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:16:00+0100, Hi wrote:
> On 2023-01-20T22:04:02+0100, I wrote:
>> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
>> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
>> configuration of libgfortran.
>
> This is achieved by 'nvptx, libgfortran: Switch out of "minimal" mode',
> see attached, again based on WIP work by Andrew Stubbs.

I've recently slightly revised this, in particular:

> The OpenACC XFAILs: "[...] overflows the stack [...]"

... I now avoid by use of commit 0d25989d60d15866ef4737d66e02432f50717255
"nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment 
variable [PR97384, PR105274]".

The underlying issue remains...

> [...] unresolved at this point; see the discussion around
> "Handling of large stack objects in GPU code generation -- maybe transform 
> into heap allocation?",
> and my "nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold'"
> experimenting.  (The latter works to some extent, but also has other
> issues that I shall detail at some later point in time.)

(No progress.)


Pushed to trunk branch commit 3a4775d4403f2e88b589e88a9937cc1fd45a0e87
'nvptx, libgfortran: Switch out of "minimal" mode', see attached.

This, unsurprisingly, also greatly improves GCC/Fortran test results for
nvptx target.


Grüße
 Thomas


>From 3a4775d4403f2e88b589e88a9937cc1fd45a0e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:13:24 +0200
Subject: [PATCH] nvptx, libgfortran: Switch out of "minimal" mode

..., in order to enable (portions of) Fortran I/O, for example.

	libgfortran/
	* configure.ac: No longer set 'LIBGFOR_MINIMAL' for nvptx.
	* configure: Regenerate.
	libgomp/
	* libgomp.texi (nvptx): Update.
	* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
	* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Adjust.

Co-authored-by: Andrew Stubbs 
---
 libgfortran/configure | 21 --
 libgfortran/configure.ac  | 17 +++-
 libgomp/libgomp.texi  | 10 +++--
 .../libgomp.fortran/target-print-1-nvptx.f90  | 11 -
 .../libgomp.fortran/target-print-1.f90|  3 --
 .../libgomp.oacc-fortran/error_stop-2-nvptx.f | 39 ++
 .../libgomp.oacc-fortran/error_stop-2.f   |  3 +-
 .../libgomp.oacc-fortran/print-1-nvptx.f90| 40 +++
 .../libgomp.oacc-fortran/print-1.f90  |  4 +-
 .../libgomp.oacc-fortran/stop-2-nvptx.f   | 36 +
 .../testsuite/libgomp.oacc-fortran/stop-2.f   |  3 +-
 11 files changed, 134 insertions(+), 53 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.fortran/target-print-1-nvptx.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/stop-2-nvptx.f

diff --git a/libgfortran/configure b/libgfortran/configure
index 774dd52fc95..11a1bc5f070 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6207,17 +6207,12 @@ else
 fi
 
 
-# For GPU offloading, not everything in libfortran can be supported.
-# Currently, the only target that has this problem is nvptx.  The
-# following is a (partial) list of features that are unsupportable on
-# this particular target:
-# * Constructors
-# * alloca
-# * C library support for I/O, with printf as the one notable exception
-# * C library support for other features such as signal, environment
-#   variables, time functions
-
- if test "x${target_cpu}" = xnvptx; then
+# "Minimal" mode is for targets that cannot (yet) support all features of
+# libgfortran.  It avoids the need for working constructors, alloca, and C
+# library support for I/O, signals, environment variables, time functions, etc.
+# At present there are no targets that require this mode.
+
+ if false; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
@@ -12852,7 +12847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12855 "configure"
+#line 12850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12958,7 +12953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12961 "configure"
+#line 12956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/libgfortran/configure.a

Re: nvptx, libgcc: Stub unwinding implementation

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:04:02+0100, I wrote:
> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
> configuration of libgfortran.  One prerequisite patch, based on WIP work
> by Andrew Stubbs, is: "nvptx, libgcc: Stub unwinding implementation"

Pushed to trunk branch commit a29c5852a606588175d11844db84da0881227100
"nvptx, libgcc: Stub unwinding implementation", see attached.


Grüße
 Thomas


>From a29c5852a606588175d11844db84da0881227100 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:11:04 +0200
Subject: [PATCH] nvptx, libgcc: Stub unwinding implementation

Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
	* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs 
---
 libgcc/config/nvptx/t-nvptx|  3 ++-
 libgcc/config/nvptx/unwind-nvptx.c | 37 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/nvptx/unwind-nvptx.c

diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index 260ed6334db..1ff574c2982 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,6 +1,7 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/mgomp.c \
-	$(srcdir)/config/nvptx/atomic.c
+	$(srcdir)/config/nvptx/atomic.c \
+	$(srcdir)/config/nvptx/unwind-nvptx.c
 
 # Until we have libstdc++-v3/libsupc++ proper.
 LIB2ADD += $(srcdir)/c++-minimal/guard.c
diff --git a/libgcc/config/nvptx/unwind-nvptx.c b/libgcc/config/nvptx/unwind-nvptx.c
new file mode 100644
index 000..d08ba266be1
--- /dev/null
+++ b/libgcc/config/nvptx/unwind-nvptx.c
@@ -0,0 +1,37 @@
+/* Stub unwinding implementation.
+
+   Copyright (C) 2019-2024 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "unwind.h"
+
+_Unwind_Reason_Code
+_Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument)
+{
+  return 0;
+}
+
+_Unwind_Ptr
+_Unwind_GetIPInfo (struct _Unwind_Context *c, int *ip_before_insn)
+{
+  return 0;
+}
-- 
2.34.1



Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-18 Thread Thomas Schwinge
Hi Chung-Lin!

Thanks for your work here, which I'm beginning to look into (prerequisite
"[PATCH, OpenACC 2.7] Implement reductions for arrays and structs",
first, of course); it'll take me some time.


In non-offloading testing, I noticed for x86_64-pc-linux-gnu '-m32':

+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  execution test

With optimizations enabled, it runs into 'STOP 4'.

Per '-Wextra':

[...]/libgomp.oacc-fortran/reduction-13.f90:40:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:63:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:64:6: Warning: Inequality 
comparison for REAL(8) at (1) [-Wcompare-reals]

Do we need to allow for some epsilon (generally in such test cases), or
is there another problem?

For reference:

On 2024-02-08T22:47:13+0800, Chung-Lin Tang  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
> @@ -0,0 +1,66 @@
> +! { dg-do run }
> +
> +! record type reductions
> +
> +program reduction_13
> +  implicit none
> +
> +  type t1
> + integer :: i
> + real :: r
> +  end type t1
> +
> +  type t2
> + real :: r
> + integer :: i
> + double precision :: d
> +  end type t2
> +
> +  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
> +  integer :: i
> +  type(t1) :: v1, a1
> +  type (t2) :: v2, a2
> +
> +  v1%i = 0
> +  v1%r = 0
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
> +  !$acc loop reduction (+:v1)
> +  do i = 1, n
> + v1%i = v1%i + 1
> + v1%r = v1%r + 2
> +  end do
> +  !$acc end parallel
> +  a1%i = 0
> +  a1%r = 0
> +  do i = 1, n
> + a1%i = a1%i + 1
> + a1%r = a1%r + 2
> +  end do
> +  if (v1%i .ne. a1%i) STOP 1
> +  if (v1%r .ne. a1%r) STOP 2
> +
> +  v2%i = 1
> +  v2%r = 1
> +  v2%d = 1
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
> +  !$acc loop reduction (*:v2)
> +  do i = 1, n
> + v2%i = v2%i * 2
> + v2%r = v2%r * 1.1
> + v2%d = v2%d * 1.3
> +  end do
> +  !$acc end parallel
> +  a2%i = 1
> +  a2%r = 1
> +  a2%d = 1
> +  do i = 1, n
> + a2%i = a2%i * 2
> + a2%r = a2%r * 1.1
> + a2%d = a2%d * 1.3
> +  end do
> +
> +  if (v2%i .ne. a2%i) STOP 3
> +  if (v2%r .ne. a2%r) STOP 4
> +  if (v2%d .ne. a2%d) STOP 5
> +
> +end program reduction_13


Grüße
 Thomas


OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing (was: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends)

2024-03-14 Thread Thomas Schwinge
 if (n->u.map_op == OMP_MAP_RELEASE
>> -   || n->u.map_op == OMP_MAP_DELETE)
>> +  else if (n->u.map.op == OMP_MAP_RELEASE
>> +   || n->u.map.op == OMP_MAP_DELETE)
>>  ;
>>else if (op == EXEC_OMP_TARGET_EXIT_DATA
>> || op == EXEC_OACC_EXIT_DATA)
>> @@ -4088,6 +4091,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
>> gfc_omp_clauses *clauses,
>>  }
>>if (n->u.present_modifier)
>>  OMP_CLAUSE_MOTION_PRESENT (node) = 1;
>> +  if (list == OMP_LIST_CACHE && n->u.map.readonly)
>> +OMP_CLAUSE__CACHE__READONLY (node) = 1;
>>omp_clauses = gfc_trans_add_clause (node, omp_clauses);
>>  }
>>break;
>> @@ -6561,7 +6566,7 @@ gfc_add_clause_implicitly (gfc_omp_clauses 
>> *clauses_out,
>>n2->where = n->where;
>>n2->sym = n->sym;
>>if (is_target)
>> -n2->u.map_op = OMP_MAP_TOFROM;
>> +n2->u.map.op = OMP_MAP_TOFROM;
>>if (tail)
>>  {
>>tail->next = n2;

>> diff --git a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 
>> b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
>> new file mode 100644
>> index 000..696ebd08321
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
>> @@ -0,0 +1,89 @@
>> +! { dg-additional-options "-fdump-tree-original" }
>> +
>> +subroutine foo (a, n)
>> +  integer :: n, a(:)
>> +  integer :: i, b(n), c(n)
>> +  !$acc parallel copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end parallel
>> +
>> +  !$acc kernels copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end kernels
>> +
>> +  !$acc serial copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end serial
>> +
>> +  !$acc data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end data
>> +
>> +  !$acc enter data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +
>> +end subroutine foo
>> +
>> +program main
>> +  integer :: g(32), h(32)
>> +  integer :: i, n = 32, a(32)
>> +  integer :: b(32), c(32)
>> +
>> +  !$acc declare copyin(readonly: g), copyin(h)
>> +
>> +  !$acc parallel copyin(readonly: a(:32), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end parallel
>> +
>> +  !$acc kernels copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end kernels
>> +
>> +  !$acc serial copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end serial
>> +
>> +  !$acc data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end data
>> +
>> +  !$acc enter data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +
>> +end program main
>> +
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel 
>> map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ 
>> map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel 
>> map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ 
>> map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels 
>> map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ 
>> map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels 
>> map

Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2024-03-13 Thread Thomas Schwinge
Hi Chung-Lin!

On 2024-03-07T17:02:02+0900, Chung-Lin Tang  wrote:
> On 2023/10/26 6:43 PM, Thomas Schwinge wrote:
>>>>>> +++ b/gcc/tree.h
>>>>>> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
>>>>>>   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
>>>>>> (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
>>>>>> OMP_CLAUSE_MAP)->base.addressable_flag)
>>>>>>
>>>>>> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
>>>>>> +#define OMP_CLAUSE_MAP_READONLY(NODE) \
>>>>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>>>>>> +
>>>>>> +/* Same as above, for use in OpenACC cache directives.  */
>>>>>> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
>>>>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
>>>>> I'm not sure if these special accessor functions are actually useful, or
>>>>> we should just directly use 'TREE_READONLY' instead?  We're only using
>>>>> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
>>>>> satisfied, for example.
>>>> I find directly using TREE_READONLY confusing.
>>>
>>> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better 
>>> sense of safety :P
>> 
>> I don't understand that, why not use 'TREE_READONLY'?
>> 
>>> I think there's a misunderstanding here anyways: we are not relying on a 
>>> DECL marked
>>> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
>>> OMP_CLAUSE_MAP_READONLY == 1.
>> 
>> Yes, I understand that.  My question was why we don't just use
>> 'TREE_READONLY (c)', where 'c' is the
>> 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid
>> the indirection through
>> '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY',
>> given that we're only using them in contexts where it's clear that the
>> 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied.  I don't have a strong
>> preference, though.
>
> After further re-testing using TREE_NOTHROW, I have reverted to using 
> TREE_READONLY

ACK, thanks.

> because TREE_NOTHROW clashes
> with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* 
> naming convention and is
> not documented in gcc/tree-core.h either, hmmm...)

Yeah, it's a mess...  The same bits of information spread over three
different places.

(One day I'll turn 'tree's into a proper C++ class hierarchy, with
accessor methods for such flags, statically checked at compile-time, and
thus documented in a single place.  Etc.)

> I have added the comment adjustments in gcc/tree-core.h for the new uses of 
> TREE_READONLY/readonly_flag.
>
> We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause 
> expressions exclusively,
> so I don't see a reason to diverge from that style (even when context is 
> clear).

ACK.

> I have greatly expanded the test scan patterns to include 
> parallel/kernels/serial/data/enter data,
> as well as non-readonly copyin clause together with readonly.

Thanks.

> Also added simple 'declare' tests, but there is not anything to scan in the 
> 'tree-original' dump though.

Yeah, the current OpenACC 'declare' implementation is "special".

>>> --- a/gcc/fortran/openmp.cc
>>> +++ b/gcc/fortran/openmp.cc
>>> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : 
>>> omp_mask (m)
>>>
>>>  static bool
>>>  gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
>>> -   bool allow_common, bool allow_derived)
>>> +   bool allow_common, bool allow_derived, bool 
>>> readonly = false)
>>>  {
>>>gfc_omp_namelist **head = NULL;
>>>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, , 
>>> true,
>>> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, 
>>> gfc_omp_map_op map_op,
>>>  {
>>>gfc_omp_namelist *n;
>>>for (n = *head; n; n = n->next)
>>> - n->u.map_op = map_op;
>>> + {
>>> +   n->u.map.op = map_op;
>>> +   n->u.map.readonly = readonly;
>>> + }
>>>return true;
>>>  }
>> 
>> Didn't we conclude that "not doing it here is cleaner" (Tobias' words),
>> and instead do this "Similar to 'c_parser_omp_var_list_p

Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-09 Thread Thomas Schwinge
Hi Julian!

On 2024-01-07T16:04:37+0100, Tobias Burnus  wrote:
> Am 05.01.24 um 13:23 schrieb Julian Brown:
>> Here's a rebased/retested version [...]

> LGTM - [...]

Got pushed as commit r14-7033-g1413af02d62182bc1e19698aaa4dae406f8f13bf
"OpenMP: lvalue parsing for map/to/from clauses (C++)".

Some (hopefully minor) tuning in the test cases is necessary; for
example, for x86_64-pc-linux-gnu '-m32' testing, I see a few FAILs:

+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] [len: x != 0 ? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] \\[len: x != 0 \\? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 15 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 16 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 17 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 22 (test for 
warnings, line 21)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 36 (test for 
errors, line 35)
+FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 37 (test for 
warnings, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 38 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 39 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 44 (test for 
warnings, line 43)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98 (test for excess 
errors)

Etc.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-21 Thread Thomas Schwinge
Hi!

On 2023-12-13T21:52:29+0100, I wrote:
> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> On 2023/12/12 1:45, H.J. Lu wrote:
>>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>>> > > > This patch try to introduce the rwlock and split the read/write to
>>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>>> > > > percentage to step into the insert_unit function is around 30%, in
>>> > > > most instances, we can get the unit in the phase of reading the
>>> > > > unit_cache or unit_root tree. So split the read/write phase by
>>> > > > rwlock would be an approach to make it more parallel.
>>> > > >
>>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>>> > > > 220 cores. The benchmark we used is
>>> > > > https://github.com/rwesson/NEAT
>
>>> > > Ok for trunk, thanks.
>
>>> > Thanks! Looking forward to landing to trunk.
>
>>> Pushed for you.

> I've just filed 
> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution 
> test timeouts".
> Would you be able to look into that?

See my update in there.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-14T15:26:38+0100, Tobias Burnus  wrote:
> On 19.08.23 00:47, Julian Brown wrote:
>> This patch adds support for non-constant component offsets in "map"
>> clauses for OpenMP (and the equivalants for OpenACC) [...]

Should eventually also add some OpenACC test cases?


> LGTM with:
>
> - inclusion of your follow-up fix for shared-memory systems (see email
> of August 21)

This was applied here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., and here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., but not here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

>> +! { dg-output "(\n|\r|\r\n)" }
>> +! { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" }
>> +! { dg-shouldfail "" { offload_device_nonshared_as } }

Pushed to master branch commit bc7546e32c5a942e240ef97776352d21105ef291
"In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target 
offload_device_nonshared_as'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bc7546e32c5a942e240ef97776352d21105ef291 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 13:05:24 +0100
Subject: [PATCH] In 'libgomp.fortran/map-subarray-5.f90', restrict
 'dg-output's to 'target offload_device_nonshared_as'

..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.

Minor fix-up for commit f5745dc1426bdb1a53ebaf7af758b2250ccbff02
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".

	libgomp/
	* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
	'dg-output's to 'target offload_device_nonshared_as'.
---
 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
index e7cdf11e610..59ad01ab76b 100644
--- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -49,6 +49,6 @@ end do
 
 end
 
-! { dg-output "(\n|\r|\r\n)" }
-! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" }
+! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } }
+! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } }
 ! { dg-shouldfail "" { offload_device_nonshared_as } }
-- 
2.34.1



RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:
> On 2023/12/14 4:52, Thomas Schwinge wrote:
>> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> > On 2023/12/12 1:45, H.J. Lu wrote:
>> >> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng 
>> wrote:
>> >> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> >> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> >> > > > This patch try to introduce the rwlock and split the read/write
>> >> > > > to unit_root tree and unit_cache with rwlock instead of the
>> >> > > > mutex to increase CPU efficiency. In the get_gfc_unit function,
>> >> > > > the percentage to step into the insert_unit function is around
>> >> > > > 30%, in most instances, we can get the unit in the phase of
>> >> > > > reading the unit_cache or unit_root tree. So split the
>> >> > > > read/write phase by rwlock would be an approach to make it more
>> parallel.
>> >> > > >
>> >> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> >> > > > 220 cores. The benchmark we used is
>> >> > > > https://github.com/rwesson/NEAT

>> I've just filed <https://gcc.gnu.org/PR113005>
>> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution
>> test timeouts".
>> Would you be able to look into that?

> Sure, I will look into that.
>
> BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> your
> test environment to me to help reproduce the issue?

That's unfortunately not possible: it's behind company VPN, restricted
access.  :-/ I'll later try to have at least a quick look where it's
hanging, or what it's doing.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Build breakage

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T20:27:44+, Jonathan Wakely  wrote:
> On Wed, 13 Dec 2023, 19:37 Thomas Schwinge,  wrote:
>> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> > I am getting this failure to build from clean trunk.
>>
>> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
>> "libgomp: basic pinned memory on Linux", which supposedly was only tested
>> with '--disable-multilib' or so.  As Andrew's now on vacations --
>> conveniently ;-P -- I'll soon push a fix.
>>
>> (To restore your build, you may locally disable the 'gomp_debug' call, or
>> cast 'size' into '(long) size', for example.)
>>
>
> Wouldn't --disable-werror work too?

I suppose so, but that comes with re-'configure'ing, re-starting the
build from scratch, or otherwise manually fiddling with 'Makefile's etc.,
whereas after editing the source file as indicated, you may simply resume
'make'.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-13 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
> On 2023/12/12 1:45, H.J. Lu wrote:
>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> > > > This patch try to introduce the rwlock and split the read/write to
>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>> > > > percentage to step into the insert_unit function is around 30%, in
>> > > > most instances, we can get the unit in the phase of reading the
>> > > > unit_cache or unit_root tree. So split the read/write phase by
>> > > > rwlock would be an approach to make it more parallel.
>> > > >
>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> > > > 220 cores. The benchmark we used is
>> > > > https://github.com/rwesson/NEAT

>> > > Ok for trunk, thanks.

>> > Thanks! Looking forward to landing to trunk.

>> Pushed for you.

> Thanks for everyone's patience and help, really appreciate that!

Congratulations on your first contribution to GCC (as far as I can tell)!
:-)


I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch (was: Build breakage)

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T20:36:40+0100, I wrote:
> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> I am getting this failure to build from clean trunk.
>
> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> with '--disable-multilib' or so.  As Andrew's now on vacations --
> conveniently ;-P -- I'll soon push a fix.

Pushed to master branch commit 5445ff4a51fcee4d281f79b5f54b349290d0327d
"Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string 
mismatch",
see attached.


Grüße
 Thomas


>> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
>> ../../../../trunk/libgomp/config/linux/allocator.c: In function
>> ‘linux_memspace_alloc’:
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
>> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
>> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ^
>> 71 |   " memory (ulimit too low?)\n", size);
>>|  
>>|  |
>>|  size_t
>> {aka unsigned int}
>> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
>> ‘gomp_debug’
>>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>>| ^~~
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
>> string is defined here
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ~~^
>>||
>>|long int
>>|  %d


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5445ff4a51fcee4d281f79b5f54b349290d0327d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Dec 2023 17:48:11 +0100
Subject: [PATCH] Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld'
 format string mismatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:

In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ^
   71 |   " memory (ulimit too low?)\n", size);
  |  
  |  |
  |  size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
  186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
  | ^~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ~~^
  ||
  |long int
  |  %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]

Fix this in the same way as used elsewhere in libgomp.

	libgomp/
	* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
	vs. '%ld' format string mismatch.
---
 libgomp/config/linux/allocator.c | 12 ++--
 1 file change

Re: Build breakage

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
> I am getting this failure to build from clean trunk.

This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which supposedly was only tested
with '--disable-multilib' or so.  As Andrew's now on vacations --
conveniently ;-P -- I'll soon push a fix.

(To restore your build, you may locally disable the 'gomp_debug' call, or
cast 'size' into '(long) size', for example.)


Grüße
 Thomas


> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
> ../../../../trunk/libgomp/config/linux/allocator.c: In function
> ‘linux_memspace_alloc’:
> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>|  ^
> 71 |   " memory (ulimit too low?)\n", size);
>|  
>|  |
>|  size_t
> {aka unsigned int}
> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
> ‘gomp_debug’
>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>| ^~~
> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
> string is defined here
> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>|  ~~^
>||
>|long int
>|  %d
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] OpenMP: Minor '!$omp allocators' cleanup - and still: Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-11 Thread Thomas Schwinge
Hi!

This issue would've been prevented if we'd actually use a distinct C++
data type for GCC types, checkable at compile time -- I'm thus CCing
Andrew MacLeod for amusement or crying, "one more for the list!".  ;-\
(See
<https://inbox.sourceware.org/1acd7994-2440-4092-897f-97f14d3fb...@redhat.com>
"[TTYPE] Strongly typed tree project. Original document circa 2017".)

On 2023-12-11T12:45:27+0100, Tobias Burnus  wrote:
> I included a minor cleanup patch [...]
>
> I intent to commit that patch as obvious, unless there are further comments.

> OpenMP: Minor '!$omp allocators' cleanup

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -8361,8 +8361,10 @@ gfc_omp_call_add_alloc (tree ptr)
>if (fn == NULL_TREE)
>  {
>fn = build_function_type_list (void_type_node, ptr_type_node, 
> NULL_TREE);
> +  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
> +  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES 
> (fn));
> +  fn = build_type_attribute_variant (fn, att);
>fn = build_fn_decl ("GOMP_add_alloc", fn);
> -/* FIXME: attributes.  */
>  }
>return build_call_expr_loc (input_location, fn, 1, ptr);
>  }
> @@ -8380,7 +8382,9 @@ gfc_omp_call_is_alloc (tree ptr)
>fn = build_function_type_list (boolean_type_node, ptr_type_node,
>NULL_TREE);
>fn = build_fn_decl ("GOMP_is_alloc", fn);
> -/* FIXME: attributes.  */
> +  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
> +  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES 
> (fn));
> +  fn = build_type_attribute_variant (fn, att);
>  }
>return build_call_expr_loc (input_location, fn, 1, ptr);
>  }

Pushed to master branch commit 453e0f45a49f425992bc47ff8909ed8affc29d2e
"Resolve ICE in 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 453e0f45a49f425992bc47ff8909ed8affc29d2e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Dec 2023 22:52:54 +0100
Subject: [PATCH] Resolve ICE in
 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'

Fix-up for recent commit 2505a8b41d3b74a545755a278f3750a29c1340b6
"OpenMP: Minor '!$omp allocators' cleanup", which caused:

{+FAIL: gfortran.dg/gomp/allocate-5.f90   -O  (internal compiler error: tree check: expected class 'type', have 'declaration' (function_decl) in gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)+}
[-PASS:-]{+FAIL:+} gfortran.dg/gomp/allocate-5.f90   -O  (test for excess errors)

..., and similarly in 'libgomp.fortran/allocators-1.f90',
'libgomp.fortran/allocators-2.f90', 'libgomp.fortran/allocators-3.f90',
'libgomp.fortran/allocators-4.f90', 'libgomp.fortran/allocators-5.f90'.

	gcc/fortran/
	* trans-openmp.cc (gfc_omp_call_is_alloc): Resolve ICE.
---
 gcc/fortran/trans-openmp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 95184920cf7..f7c73a5d273 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -8381,10 +8381,10 @@ gfc_omp_call_is_alloc (tree ptr)
 {
   fn = build_function_type_list (boolean_type_node, ptr_type_node,
  NULL_TREE);
-  fn = build_fn_decl ("GOMP_is_alloc", fn);
   tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
   att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES (fn));
   fn = build_type_attribute_variant (fn, att);
+  fn = build_fn_decl ("GOMP_is_alloc", fn);
 }
   return build_call_expr_loc (input_location, fn, 1, ptr);
 }
-- 
2.34.1



Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-09 Thread Thomas Schwinge
Hi Tobias!

On 2023-11-08T17:58:10+0100, Tobias Burnus  wrote:
> OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

Nice work!

> This commit adds -fopenmp-allocators which enables support for
> 'omp allocators' and 'omp allocate' that are associated with a Fortran
> allocate-stmt. If such a construct is encountered, an error is shown,
> unless the -fopenmp-allocators flag is present.
>
> With -fopenmp -fopenmp-allocators, those constructs get turned into
> GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
> ensures deallocation and reallocation (via intrinsic assignments) are
> properly directed to GOMP_free/omp_realloc - while normal Fortran
> allocations are processed by free/realloc.
>
> In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
> version field of the Fortran array discriptor is (mis)used: 0 indicates
> the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
> there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
> pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
> true it was previously added but also removes it from the list.
>
> Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
> omp-builtins.def and libgomp gains the mentioned two new function.

Minor comments:

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc

> +/* Add ptr for tracking as being allocated by GOMP_alloc. */
> +
> +tree
> +gfc_omp_call_add_alloc (tree ptr)
> +{
> +  static tree fn = NULL_TREE;
> +  if (fn == NULL_TREE)
> +{
> +  fn = build_function_type_list (void_type_node, ptr_type_node, 
> NULL_TREE);
> +  fn = build_fn_decl ("GOMP_add_alloc", fn);
> +/* FIXME: attributes.  */
> +}
> +  return build_call_expr_loc (input_location, fn, 1, ptr);
> +}
> +
> +/* Generated function returns true when it was tracked via GOMP_add_alloc and
> +   removes it from the tracking.  As called just before GOMP_free or 
> omp_realloc
> +   the pointer is or might become invalid, thus, it is always removed. */
> +
> +tree
> +gfc_omp_call_is_alloc (tree ptr)
> +{
> +  static tree fn = NULL_TREE;
> +  if (fn == NULL_TREE)
> +{
> +  fn = build_function_type_list (boolean_type_node, ptr_type_node,
> +  NULL_TREE);
> +  fn = build_fn_decl ("GOMP_is_alloc", fn);
> +/* FIXME: attributes.  */
> +}
> +  return build_call_expr_loc (input_location, fn, 1, ptr);
> +}

Why not define 'GOMP_add_alloc', 'GOMP_is_alloc' via
'gcc/omp-builtins.def'?

> --- a/gcc/omp-builtins.def
> +++ b/gcc/omp-builtins.def
> @@ -467,6 +467,9 @@ DEF_GOMP_BUILTIN 
> (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER,
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC,
> "GOMP_alloc", BT_FN_PTR_SIZE_SIZE_PTRMODE,
> ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST)
> +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_REALLOC,
> +   "omp_realloc", BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE,
> +   ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE,
> "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",

Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'),
or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c', like for
'GOMP_alloc', 'GOMP_free'; 'ialias_call'ing the respective 'omp_[...]'
functions?  (Verbatim 'omp_realloc' also mentioned in the comment before
'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'.)

> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c

> +/* Add pointer as being alloced by GOMP_alloc.  */
> +void
> +GOMP_add_alloc (void *ptr)
> +{
> +  [...]
> +}
> +
> +/* Remove pointer, either called by FREE or by REALLOC,
> +   either of them can change the allocation status.  */
> +bool
> +GOMP_is_alloc (void *ptr)
> +{
> +  [...]
> +}

> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map

> +GOMP_5.1.2 {
> +  global:
> + GOMP_add_alloc;
> + GOMP_is_alloc;
> + [...]
> +} GOMP_5.1.1;

'GOMP_add_alloc', 'GOMP_is_alloc' should get prototyped in
'libgomp/libgomp_g.h'.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


GCN: Address undeclared 'NULL' usage in 'libgcc/config/gcn/gthr-gcn.h:__gthread_getspecific' (was: [PATCH 1/3] Create GCN-specific gthreads)

2023-11-03 Thread Thomas Schwinge
gt; +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* AMD GCN does not support dynamic creation of threads.  There may be many
> +   hardware threads, but they're all created simultaneously at launch time.
> +
> +   This implementation is intended to provide mutexes for libgfortran, etc.
> +   It is not intended to provide a TLS implementation at this time,
> +   although that may be added later if needed.
> +
> +   __gthread_active_p returns "1" to ensure that mutexes are used, and that
> +   programs attempting to use emutls will fail with the appropriate abort.
> +   It is expected that the TLS tests will fail.  */
> +
> +#ifndef GCC_GTHR_GCN_H
> +#define GCC_GTHR_GCN_H
> +
> +#define __GTHREADS 1
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#ifdef _LIBOBJC
> +#error "Objective C is not supported on AMD GCN"
> +#else
> +
> +static inline int
> +__gthread_active_p (void)
> +{
> +  return 1;
> +}
> +
> +typedef int __gthread_key_t;
> +typedef int __gthread_once_t;
> +typedef int __gthread_mutex_t;
> +typedef int __gthread_recursive_mutex_t;
> +
> +#define __GTHREAD_ONCE_INIT 0
> +#define __GTHREAD_MUTEX_INIT 0
> +#define __GTHREAD_RECURSIVE_MUTEX_INIT 0
> +
> +static inline int
> +__gthread_once (__gthread_once_t *__once __attribute__((unused)),
> + void (*__func) (void) __attribute__((unused)))
> +{
> +  return 0;
> +}
> +
> +static inline int
> +__gthread_key_create (__gthread_key_t *__key __attribute__((unused)),
> +   void (*__dtor) (void *) __attribute__((unused)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +
> +static inline int
> +__gthread_key_delete (__gthread_key_t __key __attribute__ ((__unused__)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +
> +static inline void *
> +__gthread_getspecific (__gthread_key_t __key __attribute__((unused)))
> +{
> +  return NULL;
> +}
> +
> +static inline int
> +__gthread_setspecific (__gthread_key_t __key __attribute__((unused)),
> +const void *__ptr __attribute__((unused)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +
> +static inline int
> +__gthread_mutex_destroy (__gthread_mutex_t *__mutex __attribute__((unused)))
> +{
> +  return 0;
> +}
> +
> +static inline int
> +__gthread_recursive_mutex_destroy (__gthread_recursive_mutex_t *__mutex 
> __attribute__((unused)))
> +{
> +  return 0;
> +}
> +
> +
> +static inline int
> +__gthread_mutex_lock (__gthread_mutex_t *__mutex)
> +{
> +  while (__sync_lock_test_and_set (__mutex, 1))
> +asm volatile ("s_sleep\t1" ::: "memory");
> +
> +  return 0;
> +}
> +
> +static inline int
> +__gthread_mutex_trylock (__gthread_mutex_t *__mutex)
> +{
> +  return __sync_lock_test_and_set (__mutex, 1);
> +}
> +
> +static inline int
> +__gthread_mutex_unlock (__gthread_mutex_t *__mutex)
> +{
> +  __sync_lock_release (__mutex);
> +
> +  return 0;
> +}
> +
> +static inline int
> +__gthread_recursive_mutex_lock (__gthread_recursive_mutex_t *__mutex 
> __attribute__((unused)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +
> +static inline int
> +__gthread_recursive_mutex_trylock (__gthread_recursive_mutex_t *__mutex 
> __attribute__((unused)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +
> +static inline int
> +__gthread_recursive_mutex_unlock (__gthread_recursive_mutex_t *__mutex 
> __attribute__((unused)))
> +{
> +  /* Operation is not supported.  */
> +  return -1;
> +}
> +#endif /* _LIBOBJC */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* ! GCC_GTHR_GCN_H */
> diff --git a/libgcc/configure b/libgcc/configure
> index b2914de0629..af910b62931 100644
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -5542,6 +5542,7 @@ tm_file="${tm_file_}"
>  case $target_thread_file in
>  aix) thread_header=config/rs6000/gthr-aix.h ;;
>  dce) thread_header=config/pa/gthr-dce.h ;;
> +gcn) thread_header=config/gcn/gthr-gcn.h ;;
>  lynx)thread_header=config/gthr-lynx

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2023-10-27 Thread Thomas Schwinge
Hi!

Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958)
"Move const_parm trick to generic code"; 'gcc/tree.h':

/* Nonzero if this SSA_NAME is known to point to memory that may not
   be written to.  This is set for default defs of function parameters
   that have a corresponding r or R specification in the functions
   fn spec attribute.  This is used by alias analysis.  */
#define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \
SSA_NAME_CHECK (NODE)->base.deprecated_flag

..., may I ask you to please help review the following patch
(full-quoted)?

For context: this patch here ("second patch") depends on a first patch:

"[PATCH, OpenACC 2.7] readonly modifier support in front-ends".  That one
is still under review/rework; so you're not able to apply this second
patch here.

In a nutshell: a 'readonly' modifier has been added to the OpenACC
'copyin' clause (copy host to device memory, don't copy back at end of
region):

| If the optional 'readonly' modifier appears, then the implementation may 
assume that the data
| referenced by _var-list_ is never written to within the applicable region.

That is, for example (untested):

#pragma acc routine
void escape(int *);

int x[32] = [...];
#pragma acc parallel copyin(readonly: x)
{
  int a1 = x[3];
  escape(x);
  int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here.
  //x[22] = 0; // Invalid -- but no diagnostic mandated.
}

What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
flag.

The actual optimization then is done in this second patch.  Chung-Lin
found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
I don't have much experience with most of the following generic code, so
would appreciate a helping hand, whether that conceptually makes sense as
well as from the implementation point of view:

On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches 
 wrote:
> On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
>> As we discussed earlier, the work for actually linking this to middle-end
>> points-to analysis is a somewhat non-trivial issue. This first patch allows
>> the language feature to be used in OpenACC directives first (with no effect 
>> for now).
>> The middle-end changes are probably going to be a later patch.
>
> This second patch tries to link the readonly modifier to points-to analysis.
>
> There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in 
> the
> alias oracle routines in tree-ssa-alias.cc, so basically what this patch does 
> is
> try to make the variables holding the array section base pointers to have this
> flag set.
>
> There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
> associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
> Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
> vars carrying these receiver references on the offloaded side. These
> eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.


> This still doesn't always work as expected in terms of optimization:
> struct pointer fields and Fortran arrays (kind of like C structs) which have
> several accesses to create the pointer access on the receive/offloaded side,
> and SRA appears to not work on these sequences, so gets in the way of much
> redundancy elimination.

I understand correctly that this is left as future work?  Please add the test
cases you have, XFAILed in some reasonable way.


> Currently have one testcase where we can demonstrate 'readonly' can avoid
> a clobber by function call.

:-)


> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum 
> c_omp_region_type ort)
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
>else
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +  if (OMP_CLAUSE_MAP_READONLY (c))
> + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
>if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> && !c_mark_addressable (t))

> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum 
> c_omp_region_type ort)
>   }
> else
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +   if (OMP_CLAUSE_MAP_READONLY (c))
> + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
> if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> && !cxx_mark_addressable (t))

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -2524,6 +2524,8 @@ 

Re: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

2023-10-26 Thread Thomas Schwinge
Hi PA!

On 2023-10-26T18:28:07+0200, Paul-Antoine Arras  wrote:
> On 26/10/2023 18:16, you wrote:
>> On 2023-10-26T13:24:04+0200, Paul-Antoine Arras  
>> wrote:
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90
>>> @@ -0,0 +1,57 @@
>>> +! { dg-do compile }
>>> +! { dg-additional-options "-fopenmp" }
>>> +[...]
>>
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90
>>> @@ -0,0 +1,57 @@
>>> +! { dg-do compile }
>>> +! { dg-additional-options "-fopenmp" }
>>> +[...]
>>
>> OpenMP is not universally supported across different GCC configurations,
>> so this will FAIL for some.  Therefore, please either guard with
>> effective target:
>>
>>  @item fopenmp
>>  Target supports OpenMP via @option{-fopenmp}.
>>
>
> Would the following be enough?
>
>> diff --git gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90 
>> gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90
>> index 7dd510400f3..131603d3819 100644
>> --- gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90
>> +++ gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90
>> @@ -1,4 +1,5 @@
>>  ! { dg-do compile }
>> +! { dg-require-effective-target fopenmp }
>>  ! { dg-additional-options "-fopenmp" }
>>  !
>>  ! This failed to compile the declare variant directive due to the C_PTR
>> diff --git gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90 
>> gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90
>> index 05ccb771eee..060d29d0275 100644
>> --- gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90
>> +++ gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90
>> @@ -1,4 +1,5 @@
>>  ! { dg-do compile }
>> +! { dg-require-effective-target fopenmp }
>>  ! { dg-additional-options "-fopenmp" }
>>  !
>>  ! Ensure that C_PTR and C_FUNPTR are reported as incompatible types in 
>> variant

Yes, that looks good to me -- you may push "as obvious".


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

2023-10-26 Thread Thomas Schwinge
Hi!

On 2023-10-26T13:24:04+0200, Paul-Antoine Arras  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_20.f90
> @@ -0,0 +1,57 @@
> +! { dg-do compile }
> +! { dg-additional-options "-fopenmp" }
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_21.f90
> @@ -0,0 +1,57 @@
> +! { dg-do compile }
> +! { dg-additional-options "-fopenmp" }
> +[...]

OpenMP is not universally supported across different GCC configurations,
so this will FAIL for some.  Therefore, please either guard with
effective target:

@item fopenmp
Target supports OpenMP via @option{-fopenmp}.

..., or move into 'gcc/testsuite/gfortran.dg/gomp/' (may then remove
explicit 'dg-additional-options "-fopenmp"').

I don't know which variant makes more sense, here.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2023-10-26 Thread Thomas Schwinge
Hi!

On 2023-08-07T21:58:27+0800, Chung-Lin Tang  wrote:
> here's the updated v2 of the readonly modifier front-end patch.

Thanks.


 +++ b/gcc/c/c-parser.cc
 @@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser,

   static tree
   c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code 
 kind,
 -   tree list, bool allow_deref = false)
 +   tree list, bool allow_deref = false,
 +   bool *readonly = NULL)
 ...
>>> Instead of doing this in 'c_parser_omp_var_list_parens', I think it's
>>> clearer to have this special 'readonly :' parsing logic in the two places
>>> where it's used.

> On 2023/7/20 11:08 PM, Tobias Burnus wrote:
>> I concur. [...]
>
> Okay, I've changed the C/C++ parser parts to have the parsing logic directly
> added.

These parts now looks good to me, with one remark for the C front end
changes, see below.


 +++ b/gcc/fortran/gfortran.h
 @@ -1360,7 +1360,11 @@ typedef struct gfc_omp_namelist
   {
 gfc_omp_reduction_op reduction_op;
 gfc_omp_depend_doacross_op depend_doacross_op;
 -  gfc_omp_map_op map_op;
 +  struct
 +{
 +   ENUM_BITFIELD (gfc_omp_map_op) map_op:8;
 +   bool readonly;
 +};
 gfc_expr *align;
 struct
{
>>> [...] Thus, the above looks good to me.
>> I concur but I wonder whether it would be cleaner to name the struct;
>> this makes it also more obvious what belongs together in the union.
>>
>> Namely, naming the struct 'map' and then changing the 45 users from
>> 'u.map_op' to 'u.map.op' and the new 'u.readonly' to 'u.map.readonly'. –
>> this seems to be cleaner.
>
> I've adjusted 'u.map' to be a named struct now, and updated the references.

I like that, thanks.  (Tobias, to reduce the volume of this patch here,
please let us know if the 'map_op' -> 'map.op' mass-change should be done
separately and go into master branch already, instead of as part of this
patch.)


>>> + if (gfc_match ("readonly :") == MATCH_YES)
>>> I note this one does not have a space after ':' in 'gfc_match', but the
>>> one above in 'gfc_match_omp_clauses' does.  I don't know off-hand if that
>>> makes a difference in parsing -- probably not, as all of
>>> 'gcc/fortran/openmp.cc' generally doesn't seem to be very consistent
>>> about these two variants?
>> It *does* make a difference. And for obvious reasons. You don't want to 
>> permit:
>>
>>!$acc kernels asnyccopy(a)
>>
>> but require at least one space (or comma) between "async" and "copy"..
>> (In fixed form Fortran, it would be fine - as would be "!$acc k e nelsasy nc 
>> co p y(a)".)
>>
>> A " " matches zero or more whitespaces, but with gfc_match_space you can 
>> find out
>> whether there was whitespace or not.

OK, I generally follow -- but does this rationale also apply in this case
here, concerning space after ':'?

> Okay, made sure both are 'gfc_match ("readonly : ")'. Thanks for catching 
> that, didn't
> realize that space was significant.


 +++ b/gcc/tree.h
 @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
 (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
 OMP_CLAUSE_MAP)->base.addressable_flag)

 +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
 +#define OMP_CLAUSE_MAP_READONLY(NODE) \
 +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 +
 +/* Same as above, for use in OpenACC cache directives.  */
 +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
 +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
>>> I'm not sure if these special accessor functions are actually useful, or
>>> we should just directly use 'TREE_READONLY' instead?  We're only using
>>> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
>>> satisfied, for example.
>> I find directly using TREE_READONLY confusing.
>
> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better sense 
> of safety :P

I don't understand that, why not use 'TREE_READONLY'?

> I think there's a misunderstanding here anyways: we are not relying on a DECL 
> marked
> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
> OMP_CLAUSE_MAP_READONLY == 1.

Yes, I understand that.  My question was why we don't just use
'TREE_READONLY (c)', where 'c' is the
'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid
the indirection through
'#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY',
given that we're only using them in contexts where it's clear that the
'OMP_CLAUSE_SUBCODE_CHECK' is satisfied.  I don't have a strong
preference, though.

Either way, you still need to document this:

| Also, for the new use for OMP clauses, update 'gcc/tree.h:TREE_READONLY',
| and in 

Minor fixes for OpenACC/Fortran 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

Regarding the Fortran front end changes:

> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 13 Jun 2023 08:44:31 -0700
> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -1546,6 +1546,7 @@ typedef struct gfc_omp_clauses
>gfc_omp_namelist *lists[OMP_LIST_NUM];
>struct gfc_expr *if_expr;
>struct gfc_expr *if_exprs[OMP_IF_LAST];
> +  struct gfc_expr *self_expr;
>struct gfc_expr *final_expr;
>struct gfc_expr *num_threads;
>struct gfc_expr *chunk_size;

..., this needs to be handled in a few more places, I think...

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc

> @@ -6615,6 +6631,8 @@ gfc_split_omp_clauses (gfc_code *code,
> /* And this is copied to all.  */
> clausesa[GFC_OMP_SPLIT_TARGET].if_expr
>   = code->ext.omp_clauses->if_expr;
> +   clausesa[GFC_OMP_SPLIT_TARGET].self_expr
> + = code->ext.omp_clauses->self_expr;
> clausesa[GFC_OMP_SPLIT_TARGET].nowait
>   = code->ext.omp_clauses->nowait;
>   }

..., but this change isn't necessary: that function is for OpenMP only,
and generally doesn't (have to) care about OpenACC-only clauses.

OK to push the attached
"Minor fixes for OpenACC/Fortran 'self' clause for compute constructs",
or is anything more needed?


Also, I've filed <https://gcc.gnu.org/PR111938>
"Missing OpenACC/Fortran handling in 'gcc/fortran/frontend-passes.c'",
which applies generally, not just to the OpenACC 'self' clause on compute
constructs.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 943de6d5f1498aabfc343bf5e9dd6c2b63fc55ed Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 20 Oct 2023 15:49:35 +0200
Subject: [PATCH] Minor fixes for OpenACC/Fortran 'self' clause for compute
 constructs

... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".

	gcc/fortran/
	* dump-parse-tree.cc (show_omp_clauses): Handle 'self_expr'.
	* openmp.cc (gfc_free_omp_clauses): Likewise.
	* trans-openmp.cc (gfc_split_omp_clauses): Don't handle 'self_expr'.
---
 gcc/fortran/dump-parse-tree.cc | 6 ++
 gcc/fortran/openmp.cc  | 1 +
 gcc/fortran/trans-openmp.cc| 2 --
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index cc4846e5d74..26391df46e6 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1614,6 +1614,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   show_expr (omp_clauses->if_exprs[i]);
   fputc (')', dumpfile);
 }
+  if (omp_clauses->self_expr)
+{
+  fputs (" SELF(", dumpfile);
+  show_expr (omp_clauses->self_expr);
+  fputc (')', dumpfile);
+}
   if (omp_clauses->final_expr)
 {
   fputs (" FINAL(", dumpfile);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 2e2e23d567b..5e3cd0570bb 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -163,6 +163,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->if_expr);
   for (i = 0; i < OMP_IF_LAST; i++)
 gfc_free_expr (c->if_exprs[i]);
+  gfc_free_expr (c->self_expr);
   gfc_free_expr (c->final_expr);
   gfc_free_expr (c->num_threads);
   gfc_free_expr (c->chunk_size);
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 82bbc41b388..00782ee1716 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -6631,8 +6631,6 @@ gfc_split_omp_clauses (gfc_code *code,
 	  /* And this is copied to all.  */
 	  clausesa[GFC_OMP_SPLIT_TARGET].if_expr
 	= code->ext.omp_clauses->if_expr;
-	  clausesa[GFC_OMP_SPLIT_TARGET].self_expr
-	= code->ext.omp_clauses->self_expr;
 	  clausesa[GFC_OMP_SPLIT_TARGET].nowait
 	= code->ext.omp_clauses->nowait;
 	}
-- 
2.34.1



Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' decomposition (was: Extend test suite coverage for OpenACC 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Impl

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T11:29:52+0200, I wrote:
> On 2023-10-25T10:57:06+0200, I wrote:
>> With minor textual conflicts resolved, I've pushed this to master branch
>> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
>> "OpenACC 2.7: Implement self clause for compute constructs", see
>> attached.
>>
>>
>> I'll then apply/submit a number of follow-on commits.
>
>> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
>> From: Chung-Lin Tang 
>> Date: Tue, 13 Jun 2023 08:44:31 -0700
>> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs
>
>>  .../c-c++-common/goacc/self-clause-1.c|  22 +
>>  .../c-c++-common/goacc/self-clause-2.c|  17 +
>>  gcc/testsuite/gfortran.dg/goacc/self.f95  |  53 +
>
>>  .../libgomp.oacc-c-c++-common/self-1.c| 962 ++
>
> I found that insufficient, and added some more.  Pushed to
> master branch commit 047841a68ebf5f991e842961f9e54f3c10b94f2c
> "Extend test suite coverage for OpenACC 'self' clause for compute constructs",
> see attached.  This is mostly just adapting and cross-linking some
> existing 'if' clause test cases.  (..., which turned up a problem when
> the 'self' clause is used with OpenACC 'kernels'.)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/self-1.f90
> @@ -0,0 +1,996 @@
> +! OpenACC 'self' clause.
> +
> +! This is 'if-1.f90' with 'self(!cond)' instead of 'if(cond)' on compute
> +! constructs.
> +! ..., which the exception of certain 'kernels' constructs.

..., which I've then fixed up per master branch
commit 7b2ae64b68132c1c643cb34d58cd5eab6f9de652
"Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' 
decomposition",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7b2ae64b68132c1c643cb34d58cd5eab6f9de652 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 23 Oct 2023 15:28:30 +0200
Subject: [PATCH] Handle OpenACC 'self' clause for compute constructs in
 OpenACC 'kernels' decomposition

... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs" for that case.

	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Handle 'OMP_CLAUSE_SELF' like 'OMP_CLAUSE_IF'.
	* omp-expand.cc (expand_omp_target): Handle 'OMP_CLAUSE_SELF' for
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	gcc/testsuite/
	* c-c++-common/goacc/self-clause-2.c: Verify
	'--param=openacc-kernels=decompose'.
	* gfortran.dg/goacc/kernels-tree.f95: Adjust.
	libgomp/
	* oacc-parallel.c (GOACC_data_start): Handle
	'GOACC_FLAG_LOCAL_DEVICE'.
	(GOACC_parallel_keyed): Simplify accordingly.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Adjust.
---
 gcc/omp-expand.cc   | 14 --
 gcc/omp-oacc-kernels-decompose.cc   | 15 ---
 .../c-c++-common/goacc/self-clause-2.c  |  6 ++
 .../gfortran.dg/goacc/kernels-tree.f95  |  2 +-
 libgomp/oacc-parallel.c | 17 +
 .../testsuite/libgomp.oacc-fortran/self-1.f90   | 15 +++
 6 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 8576b938102..5c6a7f2e381 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -10334,9 +10334,19 @@ expand_omp_target (struct omp_region *region)
 
   if ((c = omp_find_clause (clauses, OMP_CLAUSE_SELF)) != NULL_TREE)
 {
-  gcc_assert (is_gimple_omp_oacc (entry_stmt) && offloaded);
+  gcc_assert ((is_gimple_omp_oacc (entry_stmt) && offloaded)
+		  || (gimple_omp_target_kind (entry_stmt)
+		  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS));
 
-  edge e = split_block_after_labels (new_bb);
+  edge e;
+  if (offloaded)
+	e = split_block_after_labels (new_bb);
+  else
+	{
+	  gsi = gsi_last_nondebug_bb (new_bb);
+	  gsi_prev ();
+	  e = split_block (new_bb, gsi_stmt (gsi));
+	}
   basic_block cond_bb = e->src;
   new_bb = e->dest;
   remove_edge (e);
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index ffc0a8f813e..dfbb34935d0 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1519,17 +1519,18 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
 	  break;
 	}
 	}
-  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF
+	   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SELF)
 	{
-	  /* If there is an 'if' clause, it must be duplicated t

Extend test suite coverage for OpenACC 'self' clause for compute constructs (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

> From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 13 Jun 2023 08:44:31 -0700
> Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

>  .../c-c++-common/goacc/self-clause-1.c|  22 +
>  .../c-c++-common/goacc/self-clause-2.c|  17 +
>  gcc/testsuite/gfortran.dg/goacc/self.f95  |  53 +

>  .../libgomp.oacc-c-c++-common/self-1.c| 962 ++

I found that insufficient, and added some more.  Pushed to
master branch commit 047841a68ebf5f991e842961f9e54f3c10b94f2c
"Extend test suite coverage for OpenACC 'self' clause for compute constructs",
see attached.  This is mostly just adapting and cross-linking some
existing 'if' clause test cases.  (..., which turned up a problem when
the 'self' clause is used with OpenACC 'kernels'.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 047841a68ebf5f991e842961f9e54f3c10b94f2c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 23 Oct 2023 14:53:29 +0200
Subject: [PATCH] Extend test suite coverage for OpenACC 'self' clause for
 compute constructs

... on top of what was provided in recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".

	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: Enhance.
	* c-c++-common/goacc/self-clause-1.c: Likewise.
	* c-c++-common/goacc/self-clause-2.c: Likewise.
	* gfortran.dg/goacc/if.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/self.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance.
	* testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New.
	* testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
---
 .../c-c++-common/goacc/if-clause-2.c  |   2 +
 .../c-c++-common/goacc/self-clause-1.c|   6 +
 .../c-c++-common/goacc/self-clause-2.c|  20 +
 gcc/testsuite/gfortran.dg/goacc/if.f95|  10 +-
 .../gfortran.dg/goacc/kernels-tree.f95|   5 +-
 .../gfortran.dg/goacc/parallel-tree.f95   |   3 +-
 gcc/testsuite/gfortran.dg/goacc/self.f95  |   8 +
 .../libgomp.oacc-c-c++-common/if-1.c  |   4 +
 .../libgomp.oacc-c-c++-common/if-self-1.c |  36 +
 .../libgomp.oacc-c-c++-common/self-1.c|   5 +
 .../testsuite/libgomp.oacc-fortran/if-1.f90   |   4 +
 .../testsuite/libgomp.oacc-fortran/self-1.f90 | 996 ++
 12 files changed, 1094 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/if-self-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/self-1.f90

diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index a48072509e1..71475521758 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,3 +1,5 @@
+/* See also 'self-clause-2.c'.  */
+
 /* { dg-additional-options "-fdump-tree-gimple" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
{ dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
index fe892bea210..28de3dc0584 100644
--- a/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/self-clause-1.c
@@ -5,6 +5,8 @@ f (int b)
 {
   struct { int i; } *p;
 
+#pragma acc parallel self(0) self(b) /* { dg-error "too many 'self' clauses" } */
+  ;
 #pragma acc parallel self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
 #pragma acc parallel self(*p)
@@ -12,6 +14,8 @@ f (int b)
  { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
+#pragma acc kernels self(0) self(b) /* { dg-error "too many 'self' clauses" } */
+  ;
 #pragma acc kernels self self(b) /* { dg-error "too many 'self' clauses" } */
   ;
 #pragma acc kernels self(*p)
@@ -19,6 +23,8 @@ f (int b)
  { dg-error {could not convert '\* p' from 'f\(int\)::' to 'bool'} {} { target c++ } .-2 } */
   ;
 
+#pragma 

Disentangle handling of OpenACC 'host', 'self' pragma tokens (was: [PATCH, OpenACC 2.7] Implement self clause for compute constructs)

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-10-25T10:57:06+0200, I wrote:
> With minor textual conflicts resolved, I've pushed this to master branch
> in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
> "OpenACC 2.7: Implement self clause for compute constructs", see
> attached.
>
>
> I'll then apply/submit a number of follow-on commits.

I found this:

> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc

>  static tree
>  c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
> -const char *where, bool finish_p = true)
> +const char *where, bool finish_p = true,
> +bool compute_p = false)
>  {
>tree clauses = NULL;
>bool first = true;
> @@ -18064,7 +18100,18 @@ c_parser_oacc_all_clauses (c_parser *parser, 
> omp_clause_mask mask,
>   c_parser_consume_token (parser);
>
>here = c_parser_peek_token (parser)->location;
> -  c_kind = c_parser_omp_clause_name (parser);
> +
> +  /* For OpenACC compute directives */
> +  if (compute_p
> +   && c_parser_next_token_is (parser, CPP_NAME)
> +   && !strcmp (IDENTIFIER_POINTER (c_parser_peek_token (parser)->value),
> +   "self"))
> + {
> +   c_kind = PRAGMA_OACC_CLAUSE_SELF;
> +   c_parser_consume_token (parser);
> + }
> +  else
> + c_kind = c_parser_omp_clause_name (parser);

..., and similarly in the C++ and (to a lesser extent) Fortran front ends
a bit twisted, and pushed to master branch
commit c92509d9fd98e02d17ab1610f696c88f606dcdf4
"Disentangle handling of OpenACC 'host', 'self' pragma tokens", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c92509d9fd98e02d17ab1610f696c88f606dcdf4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 20 Oct 2023 14:47:58 +0200
Subject: [PATCH] Disentangle handling of OpenACC 'host', 'self' pragma tokens

'gcc/c-family/c-pragma.h:pragma_omp_clause' already defines
'PRAGMA_OACC_CLAUSE_SELF', but it has no longer been used for the 'update'
directive's 'self' clause as of 2018
commit 829c6349e96c5bfa8603aaef8858b38e237a2f33 (Subversion r261813)
"Update OpenACC data clause semantics to the 2.5 behavior".  That one instead
mapped the 'self' pragma token to the 'host' one (same semantics).  That means
that we're later not able to tell whether originally we had seen 'self' or
'host', which was OK as long as only the 'update' directive had a 'self'
clause.  However, as of recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", also OpenACC
compute constructs may have a 'self' clause -- with different semantics.  That
means, we need to know which OpenACC directive we're parsing clauses for, which
can be done in a simpler way than in that commit, similar to how the OpenMP
'to' clause is handled.

While at that, clarify that (already in OpenACC 2.0a)
"The 'host' clause is a synonym for the 'self' clause." -- not the other way
round.

	gcc/c/
	* c-parser.cc (c_parser_omp_clause_name): Return
	'PRAGMA_OACC_CLAUSE_SELF' for "self".
	(c_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
	(c_parser_oacc_all_clauses): Remove 'bool compute_p' formal
	parameter, and instead locally determine whether we're called for
	an OpenACC compute construct or OpenACC 'update' directive.
	(c_parser_oacc_compute): Adjust.
	gcc/cp/
	* parser.cc (cp_parser_omp_clause_name): Return
	'PRAGMA_OACC_CLAUSE_SELF' for "self".
	(cp_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
	(cp_parser_oacc_all_clauses): Remove 'bool compute_p' formal
	parameter, and instead locally determine whether we're called for
	an OpenACC compute construct or OpenACC 'update' directive.
	(cp_parser_oacc_compute): Adjust.
	gcc/fortran/
	* openmp.cc (omp_mask2): Split 'OMP_CLAUSE_HOST_SELF' into
	'OMP_CLAUSE_SELF', 'OMP_CLAUSE_HOST'.
	(gfc_match_omp_clauses, OACC_UPDATE_CLAUSES): Adjust.
---
 gcc/c/c-parser.cc | 38 +-
 gcc/cp/parser.cc  | 39 +--
 gcc/fortran/openmp.cc | 27 ++-
 3 files changed, 48 insertions(+), 56 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index a82f5afeff7..5213a57a1ec 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14061,8 +14061,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	result = PRAGMA_OMP_CLAUSE_SCHEDULE;
 	  else if (!strcmp ("sections", p))
 	result = PRAGMA_OMP_CLAUSE_SECTIONS;
-	  else if (!strcmp ("self", p)) /* "self&q

Re: [PATCH, OpenACC 2.7] Implement self clause for compute constructs

2023-10-25 Thread Thomas Schwinge
Hi!

On 2023-06-13T23:52:25+0800, Chung-Lin Tang via Gcc-patches 
 wrote:
> This patch implements the compiler side for the 'self' clause for compute 
> constructs:
> parallel, kernels, and serial.
>
> As you know, the actual "local device" device type for libgomp is not yet 
> implemented,
> so the libgomp side is basically just a simple duplicate of what 
> host-fallback is doing,

Thanks, and ACK.

> though everything else should be completed by this patch.

What also is missing is allowing nested OpenACC compute constructs, which
GCC currently rejects.  (Just removing the nesting restriction isn't
sufficient, I think: will also have to think about explicit/implicit data
(and other?) clauses in nested compute constructs, for example, so this
doesn't seem entirely trivial to implement.)  I'm fine to defer that item
until actual multicore CPU "device" support emerges (for avoidance of
doubt: we're not currently working on that).

> Tested on powerpc64le-linux/nvptx, x64_64-linux/amdgcn tests pending.
> Is this okay for trunk?

With minor textual conflicts resolved, I've pushed this to master branch
in commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", see
attached.


I'll then apply/submit a number of follow-on commits.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 13 Jun 2023 08:44:31 -0700
Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.

The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
	(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_oacc_compute_clause_self): New function.
	(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
	parameter, add parsing of self clause when compute_p is true.
	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
	(OACC_SERIAL_CLAUSE_MASK): Likewise.
	(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
	set compute_p argument to true.
	* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
	* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
	with OMP_CLAUSE_IF case.

gcc/fortran/ChangeLog:

	* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
	* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
	(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
	(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
	(OACC_KERNELS_CLAUSES): Likewise.
	(OACC_SERIAL_CLAUSES): Likewise.
	(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
	* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
	clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
	(gfc_split_omp_clauses): Add handling of self_expr field copy.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
	* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
	case.
	(convert_local_omp_clauses): Likewise.
	* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/self-clause-1.c: New test.
	* c-c++-common/goacc/self-clause-2.c: New test.
	* gfortran.dg/goacc/self.f95: New test.

include/ChangeLog:

	* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.

libgomp/ChangeLog:

	* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
	

Re: [Patch] Fortran: Support OpenMP's 'allocate' directive for stack vars

2023-10-18 Thread Thomas Schwinge
Hi Tobias!

On 2023-10-13T15:29:52+0200, Tobias Burnus  wrote:
> => Updated patch attached

When cherry-picking this commit 2d3dbf0eff668bed5f5f168b3cafd8590c54
"Fortran: Support OpenMP's 'allocate' directive for stack vars" on top of
slightly older GCC sources (mentioning that just in case that's
relevant), in a configuration with offloading enabled (only), I see:

+FAIL: gfortran.dg/gomp/allocate-13.f90   -O  (internal compiler error: 
tree code 'statement_list' is not supported in LTO streams)
+FAIL: gfortran.dg/gomp/allocate-13.f90   -O  (test for excess errors)

during IPA pass: modref
[...]/gcc/testsuite/gfortran.dg/gomp/allocate-13.f90:10:3: internal 
compiler error: tree code 'statement_list' is not supported in LTO streams
0x13374fd lto_write_tree
[...]/gcc/lto-streamer-out.cc:561
0x13374fd lto_output_tree_1
[...]/gcc/lto-streamer-out.cc:599
0x133f55b DFS::DFS(output_block*, tree_node*, bool, bool, bool)
[...]/gcc/lto-streamer-out.cc:899
0x1340287 lto_output_tree(output_block*, tree_node*, bool, bool)
[...]/gcc/lto-streamer-out.cc:1865
0x134197a output_function
[...]/gcc/lto-streamer-out.cc:2436
0x134197a lto_output()
[...]/gcc/lto-streamer-out.cc:2807
0x13d0551 write_lto
[...]/gcc/passes.cc:2774
0x13d0551 ipa_write_summaries_1
[...]/gcc/passes.cc:2838
0x13d0551 ipa_write_summaries()
[...]/gcc/passes.cc:2894
0x1002f2c ipa_passes
[...]/gcc/cgraphunit.cc:2251
0x1002f2c symbol_table::compile()
[...]/gcc/cgraphunit.cc:2331
0x10056b7 symbol_table::compile()
[...]/gcc/cgraphunit.cc:2311
0x10056b7 symbol_table::finalize_compilation_unit()
[...]/gcc/cgraphunit.cc:2583

Similarly:

+FAIL: libgomp.fortran/allocate-6.f90   -O  (internal compiler error: tree 
code 'statement_list' is not supported in LTO streams)

+FAIL: libgomp.fortran/allocate-7.f90   -O  (internal compiler error: tree 
code 'statement_list' is not supported in LTO streams)


Grüße
 Thomas


> Fortran: Support OpenMP's 'allocate' directive for stack vars
>
> gcc/fortran/ChangeLog:
>
>   * gfortran.h (ext_attr_t): Add omp_allocate flag.
>   * match.cc (gfc_free_omp_namelist): Void deleting same
>   u2.allocator multiple times now that a sequence can use
>   the same one.
>   * openmp.cc (gfc_match_omp_clauses, gfc_match_omp_allocate): Use
>   same allocator expr multiple times.
>   (is_predefined_allocator): Make static.
>   (gfc_resolve_omp_allocate): Update/extend restriction checks;
>   remove sorry message.
>   (resolve_omp_clauses): Reject corarrays in allocate/allocators
>   directive.
>   * parse.cc (check_omp_allocate_stmt): Permit procedure pointers
>   here (rejected later) for less misleading diagnostic.
>   * trans-array.cc (gfc_trans_auto_array_allocation): Propagate
>   size for GOMP_alloc and location to which it should be added to.
>   * trans-decl.cc (gfc_trans_deferred_vars): Handle 'omp allocate'
>   for stack variables; sorry for static variables/common blocks.
>   * trans-openmp.cc (gfc_trans_omp_clauses): Evaluate 'allocate'
>   clause's allocator only once; fix adding expressions to the
>   block.
>   (gfc_trans_omp_single): Pass a block to gfc_trans_omp_clauses.
>
> gcc/ChangeLog:
>
>   * gimplify.cc (gimplify_bind_expr): Handle Fortran's
>   'omp allocate' for stack variables.
>
> libgomp/ChangeLog:
>
>   * libgomp.texi (OpenMP Impl. Status): Mention that Fortran now
>   supports the allocate directive for stack variables.
>   * testsuite/libgomp.fortran/allocate-5.f90: New test.
>   * testsuite/libgomp.fortran/allocate-6.f90: New test.
>   * testsuite/libgomp.fortran/allocate-7.f90: New test.
>   * testsuite/libgomp.fortran/allocate-8.f90: New test.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/gomp/allocate-14.c: Fix directive name.
>   * c-c++-common/gomp/allocate-15.c: Likewise.
>   * c-c++-common/gomp/allocate-9.c: Fix comment typo.
>   * gfortran.dg/gomp/allocate-4.f90: Remove sorry dg-error.
>   * gfortran.dg/gomp/allocate-7.f90: Likewise.
>   * gfortran.dg/gomp/allocate-10.f90: New test.
>   * gfortran.dg/gomp/allocate-11.f90: New test.
>   * gfortran.dg/gomp/allocate-12.f90: New test.
>   * gfortran.dg/gomp/allocate-13.f90: New test.
>   * gfortran.dg/gomp/allocate-14.f90: New test.
>   * gfortran.dg/gomp/allocate-15.f90: New test.
>   * gfortran.dg/gomp/allocate-8.f90: New test.
>   * gfortran.dg/gomp/allocate-9.f90: New test.
>
>  gcc/fortran/gfortran.h   |   1 +
>  gcc/fortran/match.cc |   9 +-
>  gcc/fortran/openmp.cc|  62 +++-
>  gcc/fortran/parse.cc |   8 +-
>  gcc/fortran/trans-array.cc 

Re: [PATCH 1/5] OpenMP, NVPTX: memcpy[23]D bias correction

2023-09-26 Thread Thomas Schwinge
Hi Julian!

On 2023-09-06T02:34:30-0700, Julian Brown  wrote:
> This patch works around behaviour of the 2D and 3D memcpy operations in
> the CUDA driver runtime.  Particularly in Fortran, the "base pointer"
> of an array (used for either source or destination of a host/device copy)
> may lie outside of data that is actually stored on the device.  The fix
> is to make sure that we use the first element of data to be transferred
> instead, and adjust parameters accordingly.

Do you (a) have a stand-alone test case for this (that is, not depending
on your other pending patches, so that this could go in directly --
together with the before-FAIL test case).  Do you (b) know if is this a
bug in our use of the CUDA Driver API or rather in CUDA itself?  If the
latter, have you reported this to Nvidia?

(I didn't quickly understand

"cuMemcpy2D" etc.)


Grüße
 Thomas


> 2023-09-05  Julian Brown  
>
> libgomp/
>   * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to
>   avoid out-of-bounds array checks in CUDA runtime.
>   (GOMP_OFFLOAD_memcpy3d): Likewise.
> ---
>  libgomp/plugin/plugin-nvptx.c | 67 +++
>  1 file changed, 67 insertions(+)
>
> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index 00d4241ae02b..cefe288a8aab 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1827,6 +1827,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, 
> size_t dim1_size,
>data.srcXInBytes = src_offset1_size;
>data.srcY = src_offset0_len;
>
> +  if (data.srcXInBytes != 0 || data.srcY != 0)
> +{
> +  /* Adjust origin to the actual array data, else the CUDA 2D memory
> +  copy API calls below may fail to validate source/dest pointers
> +  correctly (especially for Fortran where the "virtual origin" of an
> +  array is often outside the stored data).  */
> +  if (src_ord == -1)
> + data.srcHost = (const void *) ((const char *) data.srcHost
> +   + data.srcY * data.srcPitch
> +   + data.srcXInBytes);
> +  else
> + data.srcDevice += data.srcY * data.srcPitch + data.srcXInBytes;
> +  data.srcXInBytes = 0;
> +  data.srcY = 0;
> +}
> +
> +  if (data.dstXInBytes != 0 || data.dstY != 0)
> +{
> +  /* As above.  */
> +  if (dst_ord == -1)
> + data.dstHost = (void *) ((char *) data.dstHost
> +  + data.dstY * data.dstPitch
> +  + data.dstXInBytes);
> +  else
> + data.dstDevice += data.dstY * data.dstPitch + data.dstXInBytes;
> +  data.dstXInBytes = 0;
> +  data.dstY = 0;
> +}
> +
>CUresult res = CUDA_CALL_NOCHECK (cuMemcpy2D, );
>if (res == CUDA_ERROR_INVALID_VALUE)
>  /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device
> @@ -1895,6 +1924,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, 
> size_t dim2_size,
>data.srcY = src_offset1_len;
>data.srcZ = src_offset0_len;
>
> +  if (data.srcXInBytes != 0 || data.srcY != 0 || data.srcZ != 0)
> +{
> +  /* Adjust origin to the actual array data, else the CUDA 3D memory
> +  copy API call below may fail to validate source/dest pointers
> +  correctly (especially for Fortran where the "virtual origin" of an
> +  array is often outside the stored data).  */
> +  if (src_ord == -1)
> + data.srcHost
> +   = (const void *) ((const char *) data.srcHost
> + + (data.srcZ * data.srcHeight + data.srcY)
> +   * data.srcPitch
> + + data.srcXInBytes);
> +  else
> + data.srcDevice
> +   += (data.srcZ * data.srcHeight + data.srcY) * data.srcPitch
> +  + data.srcXInBytes;
> +  data.srcXInBytes = 0;
> +  data.srcY = 0;
> +  data.srcZ = 0;
> +}
> +
> +  if (data.dstXInBytes != 0 || data.dstY != 0 || data.dstZ != 0)
> +{
> +  /* As above.  */
> +  if (dst_ord == -1)
> + data.dstHost = (void *) ((char *) data.dstHost
> +  + (data.dstZ * data.dstHeight + data.dstY)
> +* data.dstPitch
> +  + data.dstXInBytes);
> +  else
> + data.dstDevice
> +   += (data.dstZ * data.dstHeight + data.dstY) * data.dstPitch
> +  + data.dstXInBytes;
> +  data.dstXInBytes = 0;
> +  data.dstY = 0;
> +  data.dstZ = 0;
> +}
> +
>CUDA_CALL (cuMemcpy3D, );
>return true;
>  }
> --
> 2.41.0
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed] - Re: [patch] OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 [PR107424]

2023-07-19 Thread Thomas Schwinge
Hi Tobias!

On 2023-07-19T10:26:12+0200, Tobias Burnus  wrote:
> Now committed as Rev. r14-2634-g85da0b40538fb0

On devel/omp/gcc-13 branch, the corresponding
commit b003e6511754dce475f7f5b0c5cd887a177e41b3
"OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 
[PR107424]"
introduces a regression:

PASS: libgomp.fortran/loop-transforms/unroll-2.f90   -O0  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/loop-transforms/unroll-2.f90   -O0  
execution test

Etc.

spawn [open ...]
   4
   8
  10
  11

Program aborted. Backtrace:
#0  0x400f9c in test
at [...]/libgomp.fortran/loop-transforms/unroll-2.f90:85
#1  0x400fd3 in main
at [...]/libgomp.fortran/loop-transforms/unroll-2.f90:59


Grüße
 Thomas


> Changes:
>
> * I missed to updated another 'sorry' (msg wording change) - now fixed;
> I also added it to the sorry-testcase file non-rectangular-loop-5.f90.
>
> * I decided to retire the PR as several issues have been fixed and the
> original title did not fit any more. The remaining issue is now tracked
> in PR110735 (i.e. handling step != const, both the generic and possibly
> a simpler special case).
>
> * I added a link to the PR to libgomp.texi such that one can find out
> what is only partially supported for Fortran.
>
> Thanks,
>
> Tobias
>
> PS: Otherwise, the following still applies:
>
> On 18.07.23 14:11, Tobias Burnus wrote:
>> Comments regarding the validity of the Fortran assumptions are welcome!
>>
>> This patch now uses a 'simple' loop for OpenMP loops with
>> a constant loop-step size. Before, it only did so for step = ±1.
>> (Otherwise, a count variable is used from which the original
>> loop index variable is calculated from.)
>>
>> For details, see the attached patch or
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107424#c12
>> (comment 12 + 14 plus the email linked in comment 12).
>>
>> Comments? Remarks? If there are none, I will relatively soonish
>> commit the attached patch to mainline, only.

> commit 85da0b40538fb0d17d89de1e7905984668e3dfef
> Author: Tobias Burnus 
> Date:   Wed Jul 19 10:18:49 2023 +0200
>
> OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or 
> -1 [PR107424]
>
> Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
> the code
>   for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
> i = count.0 * 2 + 1;
>
> While such an inner loop can be collapsed, a non-rectangular could not.
> With this commit and for all constant loop steps, a simple loop such
> as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
> constant steps of 1 and -1.)
>
> The constant step permits to know the direction (increasing/decreasing)
> that is required for the loop condition.
>
> The new code is only valid if one assumes no overflow of the loop 
> variable.
> However, the Fortran standard can be read that this must be ensured by
> the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
> "The execution of any numeric operation whose result is not defined by
> the arithmetic used by the processor is prohibited."
>
> And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
> following: The number of loop iterations handled by an iteration count,
> which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
> in step (3), this count is not only decremented by one but also:
>   "... The DO variable, if any, is incremented by the value of the
>   incrementation parameter m3."
> And for the example above, 'i' would be 'huge(i)+3' in the last
> execution cycle, which exceeds the largest model number and should
> render the example as invalid.
>
> PR fortran/107424
>
> gcc/fortran/ChangeLog:
>
> * trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
> constant loop steps.
> (gfc_trans_omp_do): Likewise; use sign to determine
> loop direction.
>
> libgomp/ChangeLog:
>
> * libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
> * testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
> commented tests.
> * testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
> test file; tests are in non-rectangular-loop-1.f90.
> * testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
> testcase to use a non-constant step to retain the 'sorry' test.
> * testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/gomp/linear-2.f90: Update dump to remove
> the additional count variable.
> ---
>  gcc/fortran/trans-openmp.cc|  18 +-
>  gcc/testsuite/gfortran.dg/gomp/linear-2.f90|   4 +-
>  

Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

2023-06-14 Thread Thomas Schwinge
Hi!

On 2023-06-13T13:11:38+0200, Tobias Burnus  wrote:
> On 13.06.23 12:42, Thomas Schwinge wrote:
>> On 2023-06-05T14:18:48+0200, I wrote:
>>> OK to push the attached
>>> "Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'"?
>>
>> Subject: [PATCH] Add
>>   'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'
>>
>>   gcc/testsuite/
>>   * gfortran.fortran-torture/execute/math.f90: Enhance for optional
>>   OpenACC, OpenMP 'target' usage.
>
> I think it is more readable with a linebreak here and with "OpenACC
> 'serial' and OpenMP ..." instead of "OpenACC, OpenMP".
>
> What I would like to see a hint somewhere in the commit log that the
> libgomp files include the gfortran.fortran-torture file. I don't care
> whether you add the hint before the changelog items as free text – or in
> the bullet above (e.g. "as it is included in libgomp/testsuite") – or
> after "New." in the following bullet list.
>
>>   libgomp/
>>   * testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
>>   * testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
>>   Likewise.
>
>> ---
>>   .../gfortran.fortran-torture/execute/math.f90 | 23 +--
>>   .../fortran-torture_execute_math.f90  |  4 
>>   .../fortran-torture_execute_math.f90  |  5 
>>   3 files changed, 30 insertions(+), 2 deletions(-)
>>   create mode 100644 
>> libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
>>   create mode 100644 
>> libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
>>
>> diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 
>> b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> index 17cc78f7a10..e71f669304f 100644
>> --- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> +++ b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
>> @@ -1,9 +1,14 @@
>>   ! Program to test mathematical intrinsics
>> +
>> +! See also 
>> 'libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90'; thus 
>> the '!$omp' directives.
>> +! See also 
>> 'libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90'; 
>> thus the '!$acc' directives.
>
> Likewise here: it is not completely obvious that this file is 'include'd
> by the other testcases.
>
> Maybe add a line "! This file is also included in:" and remove the "See
> also" or some creative variant of it.
>
> Minor remark: The OpenMP part is OK, but strict reading of the spec
> requires an "omp declare target' if a subroutine is in a different
> compilation unit. And according the glossary, that's the case here. In
> practice, it also works without as it is in the same translation unit.
> (compilation unit = for C/C++: translation unit, for Fortran:
> subprogram). I think the HPE/Cray compiler will complain, but maybe only
> when used with modules and not with subroutine subprograms. (As many
> compilers write a .mod file for modules, a late change of attributes can
> be more problematic.)
>
> Otherwise LGTM.

Thanks for the review.  I've now pushed
commit e76af2162c7b768ef0a913d485c51a80b08a1020
"Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'", see
attached.

> PS: I assume that you have check it with both with an in-build-tree and
> an in-install-tree testsuite run.

I happened to have (..., but don't think it'd make a relevant difference
here?)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e76af2162c7b768ef0a913d485c51a80b08a1020 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 2 Jun 2023 23:11:00 +0200
Subject: [PATCH] Add
 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

..., via 'include'ing the existing 'gfortran.fortran-torture/execute/math.f90',
which therefore is enhanced for optional OpenACC 'serial', OpenMP 'target'
usage.

	gcc/testsuite/
	* gfortran.fortran-torture/execute/math.f90: Enhance for optional
	OpenACC 'serial', OpenMP 'target' usage.
	libgomp/
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 .../gfortran.fortran-torture/execute/math.f90 | 24 +--
 .../fortran-torture_execute_math.f90  |  4 
 .../fortran-torture_execute_math.f90  |  5 
 3 files changed, 31 insertions(

[ping] driver: Forward '-lgfortran', '-lm' to offloading compilation

2023-06-13 Thread Thomas Schwinge
Hi!

On 2023-06-05T14:25:18+0200, I wrote:
> OK to push the attached
> "driver: Forward '-lgfortran', '-lm' to offloading compilation"?
> (We didn't have a PR open for that, or did we?)

Ping.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5d3cb866cad3bbcf47c5e66825e5710e86cc017e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 5 Jun 2023 11:26:37 +0200
Subject: [PATCH] driver: Forward '-lgfortran', '-lm' to offloading compilation

..., so that users don't manually need to specify
'-foffload-options=-lgfortran', '-foffload-options=-lm' in addition to
'-lgfortran', '-lm' (specified manually, or implicitly by the driver).

	gcc/
	* gcc.cc (driver_handle_option): Forward host '-lgfortran', '-lm'
	to offloading compilation.
	* config/gcn/mkoffload.cc (main): Adjust.
	* config/nvptx/mkoffload.cc (main): Likewise.
	* doc/invoke.texi (foffload-options): Update example.
	libgomp/
	* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Don't
	set.
	* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags):
	Likewise.
	* testsuite/libgomp.c/simd-math-1.c: Remove
	'-foffload-options=-lm'.
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 gcc/config/gcn/mkoffload.cc   | 12 
 gcc/config/nvptx/mkoffload.cc | 12 
 gcc/doc/invoke.texi   |  5 +-
 gcc/gcc.cc| 56 +++
 libgomp/testsuite/libgomp.c/simd-math-1.c |  1 -
 .../fortran-torture_execute_math.f90  |  1 -
 libgomp/testsuite/libgomp.fortran/fortran.exp |  2 -
 .../fortran-torture_execute_math.f90  |  1 -
 .../libgomp.oacc-fortran/fortran.exp  |  2 -
 9 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 988c12318fd..8b608bf024e 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -946,6 +946,18 @@ main (int argc, char **argv)
   else if (startswith (argv[i], STR))
 	gcn_stack_size = atoi (argv[i] + strlen (STR));
 #undef STR
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
 
   if (!(fopenacc ^ fopenmp))
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 6cdea45cffe..aaea9fb320d 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -649,6 +649,18 @@ main (int argc, char **argv)
   else if (strcmp (argv[i], "-dumpbase") == 0
 	   && i + 1 < argc)
 	dumppfx = argv[++i];
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
   if (!(fopenacc ^ fopenmp))
 fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> "
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2d639c92d4..7b3a2a74459 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2716,9 +2716,8 @@ the @option{-foffload-options=@var{target-list}=@var{options}} form.  The
 Typical command lines are
 
 @smallexample
--foffload-options=-lgfortran -foffload-options=-lm
--foffload-options="-lgfortran -lm" -foffload-options=nvptx-none=-latomic
--foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
+-foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
+-foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-O3
 @end smallexample
 
 @opindex fopenacc
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 2ccca00d603..15995206856 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -47,6 +47,9 @@ compilation is specified by a string called a "spec".  */
 #include "opts-jobserver.h"
 #include "common/common-target.h"
 
+#ifndef MATH_LIBRARY
+#define MATH_LIBRARY "m"
+#endif
 
 
 /* Manage the manipulation of env vars.
@@ -4117,6 +4120,48 @@ next_item:
 }
 }
 
+/* Forward certain options to offloading compilation.  */
+
+static void
+forward_offload_option (size_t op

[ping] Add 'libgomp.{, oacc-}fortran/fortran-torture_execute_math.f90'

2023-06-13 Thread Thomas Schwinge
Hi!

On 2023-06-05T14:18:48+0200, I wrote:
> OK to push the attached
> "Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'"?

Ping.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0d5095d8cd2d68113890a39a7fdb649198e576c1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 2 Jun 2023 23:11:00 +0200
Subject: [PATCH] Add
 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

	gcc/testsuite/
	* gfortran.fortran-torture/execute/math.f90: Enhance for optional
	OpenACC, OpenMP 'target' usage.
	libgomp/
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 .../gfortran.fortran-torture/execute/math.f90 | 23 +--
 .../fortran-torture_execute_math.f90  |  4 
 .../fortran-torture_execute_math.f90  |  5 
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90

diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
index 17cc78f7a10..e71f669304f 100644
--- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
+++ b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
@@ -1,9 +1,14 @@
 ! Program to test mathematical intrinsics
+
+! See also 'libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90'; thus the '!$omp' directives.
+! See also 'libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90'; thus the '!$acc' directives.
+
 subroutine dotest (n, val4, val8, known)
implicit none
real(kind=4) val4, known
real(kind=8) val8
integer n
+   !$acc routine seq
 
if (abs (val4 - known) .gt. 0.001) STOP 1
if (abs (real (val8, kind=4) - known) .gt. 0.001) STOP 2
@@ -14,17 +19,20 @@ subroutine dotestc (n, val4, val8, known)
complex(kind=4) val4, known
complex(kind=8) val8
integer n
+   !$acc routine seq
+
if (abs (val4 - known) .gt. 0.001) STOP 3
if (abs (cmplx (val8, kind=4) - known) .gt. 0.001) STOP 4
 end subroutine
 
-program testmath
+subroutine testmath
implicit none
real(kind=4) r, two4, half4
real(kind=8) q, two8, half8
complex(kind=4) cr
complex(kind=8) cq
external dotest, dotestc
+   !$acc routine seq
 
two4 = 2.0
two8 = 2.0_8
@@ -96,5 +104,16 @@ program testmath
cq = log ((-1.0_8, -1.0_8))
call dotestc (21, cr, cq, (0.3466, -2.3562))
 
-end program
+end subroutine
 
+program main
+   implicit none
+   external testmath
+
+   !$acc serial
+   !$omp target
+   call testmath
+   !$acc end serial
+   !$omp end target
+
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..3348a0bb3ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,4 @@
+! { dg-do run }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..1b2ac440762
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,5 @@
+! { dg-do run }
+!TODO { dg-prune-output {using 'vector_length \(32\)', ignoring 1} }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
-- 
2.34.1



driver: Forward '-lgfortran', '-lm' to offloading compilation

2023-06-05 Thread Thomas Schwinge
Hi!

OK to push the attached
"driver: Forward '-lgfortran', '-lm' to offloading compilation"?
(We didn't have a PR open for that, or did we?)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5d3cb866cad3bbcf47c5e66825e5710e86cc017e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 5 Jun 2023 11:26:37 +0200
Subject: [PATCH] driver: Forward '-lgfortran', '-lm' to offloading compilation

..., so that users don't manually need to specify
'-foffload-options=-lgfortran', '-foffload-options=-lm' in addition to
'-lgfortran', '-lm' (specified manually, or implicitly by the driver).

	gcc/
	* gcc.cc (driver_handle_option): Forward host '-lgfortran', '-lm'
	to offloading compilation.
	* config/gcn/mkoffload.cc (main): Adjust.
	* config/nvptx/mkoffload.cc (main): Likewise.
	* doc/invoke.texi (foffload-options): Update example.
	libgomp/
	* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Don't
	set.
	* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags):
	Likewise.
	* testsuite/libgomp.c/simd-math-1.c: Remove
	'-foffload-options=-lm'.
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 gcc/config/gcn/mkoffload.cc   | 12 
 gcc/config/nvptx/mkoffload.cc | 12 
 gcc/doc/invoke.texi   |  5 +-
 gcc/gcc.cc| 56 +++
 libgomp/testsuite/libgomp.c/simd-math-1.c |  1 -
 .../fortran-torture_execute_math.f90  |  1 -
 libgomp/testsuite/libgomp.fortran/fortran.exp |  2 -
 .../fortran-torture_execute_math.f90  |  1 -
 .../libgomp.oacc-fortran/fortran.exp  |  2 -
 9 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 988c12318fd..8b608bf024e 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -946,6 +946,18 @@ main (int argc, char **argv)
   else if (startswith (argv[i], STR))
 	gcn_stack_size = atoi (argv[i] + strlen (STR));
 #undef STR
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
 
   if (!(fopenacc ^ fopenmp))
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 6cdea45cffe..aaea9fb320d 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -649,6 +649,18 @@ main (int argc, char **argv)
   else if (strcmp (argv[i], "-dumpbase") == 0
 	   && i + 1 < argc)
 	dumppfx = argv[++i];
+  /* Translate host into offloading libraries.  */
+  else if (strcmp (argv[i], "-l_GCC_gfortran") == 0
+	   || strcmp (argv[i], "-l_GCC_m") == 0)
+	{
+	  /* Elide '_GCC_'.  */
+	  size_t i_dst = strlen ("-l");
+	  size_t i_src = strlen ("-l_GCC_");
+	  char c;
+	  do
+	c = argv[i][i_dst++] = argv[i][i_src++];
+	  while (c != '\0');
+	}
 }
   if (!(fopenacc ^ fopenmp))
 fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> "
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2d639c92d4..7b3a2a74459 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2716,9 +2716,8 @@ the @option{-foffload-options=@var{target-list}=@var{options}} form.  The
 Typical command lines are
 
 @smallexample
--foffload-options=-lgfortran -foffload-options=-lm
--foffload-options="-lgfortran -lm" -foffload-options=nvptx-none=-latomic
--foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
+-foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
+-foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-O3
 @end smallexample
 
 @opindex fopenacc
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 2ccca00d603..15995206856 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -47,6 +47,9 @@ compilation is specified by a string called a "spec".  */
 #include "opts-jobserver.h"
 #include "common/common-target.h"
 
+#ifndef MATH_LIBRARY
+#define MATH_LIBRARY "m"
+#endif
 
 
 /* Manage the manipulation of env vars.
@@ -4117,6 +4120,48 @@ next_item:
 }
 }
 
+/* Forward certain options to offloading compilation.  */
+
+static void
+forward_offload_option (size_t opt_index, const char *arg, bool validated)
+{
+  switch (opt

Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

2023-06-05 Thread Thomas Schwinge
Hi!

OK to push the attached
"Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0d5095d8cd2d68113890a39a7fdb649198e576c1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 2 Jun 2023 23:11:00 +0200
Subject: [PATCH] Add
 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'

	gcc/testsuite/
	* gfortran.fortran-torture/execute/math.f90: Enhance for optional
	OpenACC, OpenMP 'target' usage.
	libgomp/
	* testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New.
	* testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90:
	Likewise.
---
 .../gfortran.fortran-torture/execute/math.f90 | 23 +--
 .../fortran-torture_execute_math.f90  |  4 
 .../fortran-torture_execute_math.f90  |  5 
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90

diff --git a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90 b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
index 17cc78f7a10..e71f669304f 100644
--- a/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
+++ b/gcc/testsuite/gfortran.fortran-torture/execute/math.f90
@@ -1,9 +1,14 @@
 ! Program to test mathematical intrinsics
+
+! See also 'libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90'; thus the '!$omp' directives.
+! See also 'libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90'; thus the '!$acc' directives.
+
 subroutine dotest (n, val4, val8, known)
implicit none
real(kind=4) val4, known
real(kind=8) val8
integer n
+   !$acc routine seq
 
if (abs (val4 - known) .gt. 0.001) STOP 1
if (abs (real (val8, kind=4) - known) .gt. 0.001) STOP 2
@@ -14,17 +19,20 @@ subroutine dotestc (n, val4, val8, known)
complex(kind=4) val4, known
complex(kind=8) val8
integer n
+   !$acc routine seq
+
if (abs (val4 - known) .gt. 0.001) STOP 3
if (abs (cmplx (val8, kind=4) - known) .gt. 0.001) STOP 4
 end subroutine
 
-program testmath
+subroutine testmath
implicit none
real(kind=4) r, two4, half4
real(kind=8) q, two8, half8
complex(kind=4) cr
complex(kind=8) cq
external dotest, dotestc
+   !$acc routine seq
 
two4 = 2.0
two8 = 2.0_8
@@ -96,5 +104,16 @@ program testmath
cq = log ((-1.0_8, -1.0_8))
call dotestc (21, cr, cq, (0.3466, -2.3562))
 
-end program
+end subroutine
 
+program main
+   implicit none
+   external testmath
+
+   !$acc serial
+   !$omp target
+   call testmath
+   !$acc end serial
+   !$omp end target
+
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..3348a0bb3ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,4 @@
+! { dg-do run }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90 b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
new file mode 100644
index 000..1b2ac440762
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90
@@ -0,0 +1,5 @@
+! { dg-do run }
+!TODO { dg-prune-output {using 'vector_length \(32\)', ignoring 1} }
+! { dg-additional-options -foffload-options=-lm }
+
+include '../../../gcc/testsuite/gfortran.fortran-torture/execute/math.f90'
-- 
2.34.1



Re: [PATCH] OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

2023-05-03 Thread Thomas Schwinge
Hi Julian!

On 2023-04-29T03:57:41-0700, Julian Brown  wrote:
> This patch moves several tests introduced by the following patch:
>
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html
>
> into the proper location for OpenACC testing (thanks to Thomas for
> spotting my mistake!), and also fixes a few additional problems --
> missing diagnostics for non-pointer attaches, and a case where a pointer
> was incorrectly dereferenced. Tests are also adjusted for vector-length
> warnings on nvidia accelerators.
>
> Tested with offloading to nvptx. OK?

Thanks for looking into this.

I haven't reviewed the patch itself, but noticed one thing:

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/pr109622-5.f90
> @@ -0,0 +1,45 @@
> +! { dg-do compile }
> +
> +use openacc

[...]/gfortran.dg/goacc/pr109622-5.f90:3:5: Fatal Error: Cannot open module 
file 'openacc.mod' for reading at (1): No such file or directory

... for GCC build-tree testing.  Just remove the 'use openacc'; it's not
necessary here.


Grüße
 Thomas


> +implicit none
> +
> +type t
> +integer :: foo
> +character(len=8) :: bar
> +integer :: qux(5)
> +end type t
> +
> +type(t) :: var
> +
> +var%foo = 3
> +var%bar = "HELLOOMP"
> +var%qux = (/ 1, 2, 3, 4, 5 /)
> +
> +!$acc enter data copyin(var)
> +
> +!$acc enter data attach(var%foo)
> +! { dg-error "'attach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +!$acc enter data attach(var%bar)
> +! { dg-error "'attach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +!$acc enter data attach(var%qux)
> +! { dg-error "'attach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +
> +!$acc serial
> +var%foo = 5
> +var%bar = "GOODBYE!"
> +var%qux = (/ 6, 7, 8, 9, 10 /)
> +!$acc end serial
> +
> +!$acc exit data detach(var%qux)
> +! { dg-error "'detach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +!$acc exit data detach(var%bar)
> +! { dg-error "'detach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +!$acc exit data detach(var%foo)
> +! { dg-error "'detach' clause argument not pointer or allocatable" "" { 
> target *-*-* } .-1 }
> +
> +!$acc exit data copyout(var)
> +
> +if (var%foo.ne.5) stop 1
> +if (var%bar.ne."GOODBYE!") stop 2
> +
> +end
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]

2023-04-28 Thread Thomas Schwinge
Hi Julian!

On 2023-04-27T11:36:47-0700, Julian Brown  wrote:
> This patch fixes several cases where multiple attach or detach mapping
> nodes were being created for stand-alone attach or detach clauses
> in Fortran.  After the introduction of stricter checking later during
> compilation, these extra nodes could cause ICEs, as seen in the PR.
>
> The patch also fixes cases that "happened to work" previously where
> the user attaches/detaches a pointer to array using a descriptor, and
> (I think!) the "_data" field has offset zero, hence the same address as
> the descriptor as a whole.

Thanks for looking into this.

I haven't reviewed the patch itself, but noticed one thing:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622-2.f90

> +!$acc enter data copyin(var)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622-3.f90

> +!$acc enter data copyin(var, tgt)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622.f90

> +!$acc enter data copyin(var, var2)

You'll want to move these into 'libgomp/testsuite/libgomp.oacc-fortran/'
to actually test them with '-fopenacc' instead of '-fopenmp'.  ;-)


Chalk up one for the idea that I once had, to have '-fopenacc',
'-fopenmp', '-fopenmp-simd' enable '-Wunknown-pragmas' by default.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive

2023-04-01 Thread Thomas Schwinge
Hi Frederik!

Thanks for including a good number of test cases with your code changes!

This new test case:

On 2023-03-24T16:30:39+0100, Frederik Harwath  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
> @@ -0,0 +1,52 @@
> +! { dg-additional-options "-fdump-tree-original" }
> +! { dg-do run }
> +
> +module test_functions
> +  contains
> +  integer function compute_sum() result(sum)
> +implicit none
> +
> +integer :: i,j
> +
> +!$omp do
> +do i = 1,10,3
> +   !$omp unroll full
> +   do j = 1,10,3
> +  sum = sum + 1
> +   end do
> +end do
> +  end function
> +
> +  integer function compute_sum2() result(sum)
> +implicit none
> +
> +integer :: i,j
> +
> +!$omp parallel do reduction(+:sum)
> +!$omp unroll partial(2)
> +do i = 1,10,3
> +   do j = 1,10,3
> +  sum = sum + 1
> +   end do
> +end do
> +  end function
> +end module test_functions
> +
> +program test
> +  use test_functions
> +  implicit none
> +
> +  integer :: result
> +
> +  result = compute_sum ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> + call abort
> +  end if
> +
> +  result = compute_sum2 ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> + call abort
> +  end if
> +end program

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):

spawn [open ...]
  1437822992

Program aborted. Backtrace:
#0  0x8048df0 in ???
#1  0x8048ea6 in ???
#2  0x559a3af2 in ???
#3  0x8048bc0 in ???
FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

All other variants PASS with:

spawn [open ...]
  16
  16

And similarly, this new test case:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
> @@ -0,0 +1,33 @@
> +! { dg-options "-fno-openmp -fopenmp-simd" }
> +! { dg-additional-options "-fdump-tree-original" }
> +! { dg-do run }
> +
> +module test_functions
> +  contains
> +  integer function compute_sum() result(sum)
> +implicit none
> +
> +integer :: i,j
> +
> +!$omp simd
> +do i = 1,10,3
> +   !$omp unroll full
> +   do j = 1,10,3
> +  sum = sum + 1
> +   end do
> +end do
> +  end function compute_sum
> +end module test_functions
> +
> +program test
> +  use test_functions
> +  implicit none
> +
> +  integer :: result
> +
> +  result = compute_sum ()
> +  write (*,*) result
> +  if (result .ne. 16) then
> + call abort
> +  end if
> +end program

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):

spawn [open ...]
  41

Program aborted. Backtrace:
#0  0x8048c35 in ???
#1  0x8048c72 in ???
#2  0x55977af2 in ???
#3  0x8048a60 in ???
FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution 
test

All other variants PASS with:

spawn [open ...]
  16


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Enable 'gfortran.dg/weak-2.f90' for nvptx target (was: Support for WEAK attribute, part 2)

2023-03-28 Thread Thomas Schwinge
Hi!

On 2023-02-24T07:16:51+0200, Rimvydas Jasinskas via Fortran 
 wrote:
> From 5b83226c714b17780334b5bad9b17c2266af8232 Mon Sep 17 00:00:00 2001
> From: Rimvydas Jasinskas 
> Date: Fri, 24 Feb 2023 04:41:00 +
> Subject: Fortran: Add support for WEAK attribute for variables
>
>  Add the rest of the weak-*.f90 testcases.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/weak-2.f90
> @@ -0,0 +1,26 @@
> +! { dg-do compile }
> +! { dg-require-weak "" }
> +! { dg-skip-if "" { x86_64-*-mingw* } }
> +! { dg-skip-if "" { nvptx-*-* } }
> +[...]

Pushed to master branch commit b3c5933ee726004e4e47291d422dfe7ac3345062
"Enable 'gfortran.dg/weak-2.f90' for nvptx target", see attached.


I'm sorry I've not yet been able to look into the other items discussed
in this thread.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b3c5933ee726004e4e47291d422dfe7ac3345062 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 Mar 2023 22:26:30 +0200
Subject: [PATCH] Enable 'gfortran.dg/weak-2.f90' for nvptx target

Follow-up to commit bcbeebc498126c50d73809ec8a4bd0bff27ee97b
"Fortran: Add support for WEAK attribute for variables".

	gcc/testsuite/
	* gfortran.dg/weak-2.f90: Enable for nvptx target.
---
 gcc/testsuite/gfortran.dg/weak-2.f90 | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/weak-2.f90 b/gcc/testsuite/gfortran.dg/weak-2.f90
index 3e0e877e903..ab273a13b6c 100644
--- a/gcc/testsuite/gfortran.dg/weak-2.f90
+++ b/gcc/testsuite/gfortran.dg/weak-2.f90
@@ -1,10 +1,10 @@
 ! { dg-do compile }
 ! { dg-require-weak "" }
 ! { dg-skip-if "" { x86_64-*-mingw* } }
-! { dg-skip-if "" { nvptx-*-* } }
 
 ! 1.
-! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?__foo_MOD_abc" } }
+! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?__foo_MOD_abc" { target { ! nvptx-*-* } } } }
+! { dg-final { scan-assembler-times "\\.weak \\.global \\.align 4 \\.u32 __foo_MOD_abc" 1 { target nvptx-*-* } } }
 module foo
 implicit none
 !GCC$ ATTRIBUTES weak :: abc
@@ -12,14 +12,16 @@ real :: abc(7)
 end module
 
 ! 2.
-! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?impl1" } }
+! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?impl1" { target { ! nvptx-*-* } } } }
+! { dg-final { scan-assembler-times "\\.weak \\.func \\(\\.param\\.u32 %value_out\\) impl1" 2 { target nvptx-*-* } } }
 integer function impl1()
 implicit none
 !GCC$ ATTRIBUTES weak :: impl1
 end function
 
 ! 3.
-! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?bar__" } }
+! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?bar__" { target { ! nvptx-*-* } } } }
+! { dg-final { scan-assembler-times "\\.weak \\.func \\(\\.param\\.u32 %value_out\\) bar__" 2 { target nvptx-*-* } } }
 integer function impl2() bind(c,name='bar__')
 implicit none
 !GCC$ ATTRIBUTES weak :: impl2
-- 
2.25.1



Add caveat/safeguard to OpenMP: Handle descriptors in target's firstprivate [PR104949] (was: [Patch] OpenMP: Handle descriptors in target's firstprivate [PR104949])

2023-03-24 Thread Thomas Schwinge
t;> +  str = "abcde" ! work around for PR fortran/91544
>> +  do i = 1, omp_get_num_devices() + 1
>> +!$omp target firstprivate(x)
>> +  if (allocated(x)) error stop
>> +!$omp end target
>> +if (allocated(x)) error stop
>> +  end do
>> +
>> +  do i = 1, omp_get_num_devices() + 1
>> +!$omp target firstprivate(x, i)
>> +  if (allocated(x)) error stop
>> +  ! no reallocation, just malloced + assignment
>> +  x = [character(len=2+i) :: str,"fhji","klmno"]
>> +  if (len(x) /= 2+i) error stop
>> +  if (any (x /= [character(len=2+i) :: str,"fhji","klmno"])) error stop
>> +  ! This leaks memory!
>> +  ! deallocate(x)
>> +!$omp end target
>> +if (allocated(x)) error stop
>> +  end do
>> +
>> +  x = [character(len=4) :: "ABCDE","FHJI","KLMNO"]
>> +
>> +  do i = 1, omp_get_num_devices() + 1
>> +!$omp target firstprivate(x, i)
>> +  if (i <= 0) error stop
>> +  if (.not.allocated(x)) error stop
>> +  if (size(x) /= 3) error stop
>> +  if (lbound(x,1) /= 1) error stop
>> +  if (len(x) /= 4) error stop
>> +  if (any (x /= [character(len=4) :: "ABCDE","FHJI","KLMNO"])) error 
>> stop
>> +  !! Reallocation runs into the issue PR fortran/105538
>> +  !!
>> +  !!x = [character(len=2+i) :: str,"fhji","klmno"]
>> +  !!if (len(x) /= 2+i) error stop
>> +  !!if (any (x /= [character(len=2+i) :: str,"fhji","klmno"])) error 
>> stop
>> +  !! This leaks memory!
>> +  !! deallocate(x)
>> +  ! Just assign:
>> +  x = [character(len=4) :: "abcde","fhji","klmno"]
>> +  if (any (x /= [character(len=4) :: "abcde","fhji","klmno"])) error 
>> stop
>> +!$omp end target
>> +if (.not.allocated(x)) error stop
>> +if (lbound(x,1) /= 1) error stop
>> +if (size(x) /= 3) error stop
>> +if (len(x) /= 4) error stop
>> +if (any (x /= [character(len=4) :: "ABCDE","FHJI","KLMNO"])) error stop
>> +  end do
>> +  deallocate(x)
>> +end
>> +end module m
>> +
>> +use m
>> +call one
>> +call two
>> +end
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/target-firstprivate-3.f90
>> @@ -0,0 +1,24 @@
>> +implicit none
>> +  integer, allocatable :: x(:)
>> +  x = [1,2,3,4]
>> +  call foo(x)
>> +  if (any (x /= [1,2,3,4])) error stop
>> +  call foo()
>> +contains
>> +subroutine foo(c)
>> +  integer, allocatable, optional :: c(:)
>> +  logical :: is_present
>> +  is_present = present (c)
>> +  !$omp target firstprivate(c)
>> +if (is_present) then
>> +  if (.not. allocated(c)) error stop
>> +  if (any (c /= [1,2,3,4])) error stop
>> +  c = [99,88,77,66]
>> +  if (any (c /= [99,88,77,66])) error stop
>> +end if
>> +  !$omp end target
>> +  if (is_present) then
>> +if (any (c /= [1,2,3,4])) error stop
>> +  end if
>> +end
>> +end
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e8fec6998b656dac02d4bc6c69b35a0fb5611e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 23 Mar 2023 12:32:35 +0100
Subject: [PATCH] Add caveat/safeguard to OpenMP: Handle descriptors in
 target's firstprivate [PR104949]

Follow-up to commit 49d1a2f91325fa8cc011149e27e5093a988b3a49
"OpenMP: Handle descriptors in target's firstprivate [PR104949]".

	PR fortran/104949
	libgomp/
	* target.c (gomp_map_vars_internal) : Add
	caveat/safeguard.
---
 libgomp/target.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libgomp/target.c b/libgomp/target.c
index 90b4204133a..b30c6a50c7e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1396,6 +1396,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  {
 		uintptr_t target = (uintptr_t) hostaddrs[i];
 		void *devptr = *(void**) hostaddrs[i+1] + sizes[i+1];
+		/* Per
+		   <https://inbox.sourceware.org/gcc-patches/87o7pe12ke@euler.schwinge.homeip.net>
+		   "OpenMP: Handle descriptors in target's firstprivate [PR104949]"
+		   this probably needs revision for 'aq' usage.  */
+		assert (!aq);
 		gomp_copy_host2dev (devicep, aq, devptr, ,
 	sizeof (void *), false, cbufp);
 		++i;
-- 
2.25.1



Re: [PATCH][stage1] Remove conditionals around free()

2023-03-08 Thread Thomas Schwinge
Hi Bernhard!

On 2023-03-01T22:28:56+0100, Bernhard Reutner-Fischer via Gcc-patches 
 wrote:
> // POSIX: free(NULL) is perfectly valid
> // quote: If ptr is a null pointer, no action shall occur.
> @ rule1 @
> expression e;
> @@
>
> - if (e != NULL)
> -  { free(e); }
> + free (e);

Nice, Coccinelle/spatch!  (Another very interesting tool that I so far
had no chance to actually use.)

> # find ./ \( -name "*.[ch]" -o -name "*.cpp" \) -a \( ! -path 
> "./gcc/testsuite/*" -a ! -path "./gcc/contrib/*" \) -exec spatch --sp-file 
> ~/coccinelle/free-without-if-null.0.cocci --in-place

Also include '*.cc' if you'd like to find some more in 'gcc/' (and
possibly elsewhere, too) than just the following lonely one.  ;-)

> --- a/gcc/ada/rtinit.c
> +++ b/gcc/ada/rtinit.c
> @@ -481,8 +481,7 @@ __gnat_runtime_initialize (int install_handler)
>
>FindClose (hDir);
>
> -  if (dir != NULL)
> -free (dir);
> +  free (dir);


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] OpenMP: Handle descriptors in target's firstprivate [PR104949]

2023-02-28 Thread Thomas Schwinge
Hi!

I'm currently reviewing 'gomp_copy_host2dev', 'ephemeral' in a different
context, and a question came up here;
commit r13-706-g49d1a2f91325fa8cc011149e27e5093a988b3a49
"OpenMP: Handle descriptors in target's firstprivate [PR104949]":

On 2022-05-11T19:33:00+0200, Tobias Burnus  wrote:
> this patch handles (for target regions)
>firstprivate(array_descriptor)
> by not only firstprivatizing the descriptor but also the data
> it points to. This is done by turning it in omp-low.cc the clause
> into
>firstprivate(descr) firstprivate(descr.data)
> and then attaching the latter to the former. That works by
> adding an 'attach' after the last firstprivate (and checking
> for it in libgomp). The attached-to device address for a
> previous (here: the first) firstprivate is obtained by returning
> the device address inside the hostaddrs[i] alias omp_arr array,
> i.e. the compiler generates:
>omp_arr.1 =   /* firstprivate */
>omp_arr.2 = descr.data;  /* firstprivate */
>omp_arr.3 = _arr.1;  /* attach; bias:  */
> and libgomp then knows that the device address is in the
> pointer.

> Note: The code is not active for OpenACC. The existing code uses, e.g.,
> 'goto oacc_firstprivate' – thus, the new code would be
> partially active. I went for making it completely inactive for OpenACC
> by adding one '!is_gimple_omp_oacc'.

ACK.

> I bet that a deep copy would be
> also useful for OpenACC, but I have neither checked what the current
> code does nor what the OpenACC spec says about this.

Instead of adding corresponding handling to the OpenACC 'firstprivate'
special code paths later on, I suggest that we first address known issues
with OpenACC 'firstprivate' -- which probably may largely be achieved by
in fact removing those 'goto oacc_firstprivate's and other special code
paths?  For example, see 
"OpenACC 'firstprivate' clause: initial value".

That means, the following code currently isn't active for OpenACC, and
given that OpenMP 'target' doesn't do asynchronous device execution
(meaning: not in the way/implementation of OpenACC 'async'), it thus
doesn't care about the 'ephemeral' argument to 'gomp_copy_host2dev', but
still, for correctness (and once that code gets used for OpenACC):

> OpenMP: Handle descriptors in target's firstprivate [PR104949]
>
> For allocatable/pointer arrays, a firstprivate to a device
> not only needs to privatize the descriptor but also the actual
> data. This is implemented as:
>   firstprivate(x) firstprivate(x.data) attach(x [bias: )
> where the address of x in device memory is saved in hostaddrs[i]
> by libgomp and the middle end actually passes hostaddrs[i]' to
> attach.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1350,7 +1350,24 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>   gomp_copy_host2dev (devicep, aq,
>   (void *) (tgt->tgt_start + tgt_size),
>   (void *) hostaddrs[i], len, false, cbufp);

Here, passing 'ephemeral <- false' is correct, as 'h <- hostaddrs[i]'
points to non-ephemeral data.

> + /* Save device address in hostaddr to permit latter availablity
> +when doing a deep-firstprivate with pointer attach.  */
> + hostaddrs[i] = (void *) (tgt->tgt_start + tgt_size);

Here, we modify 'hostaddrs[i]' (itself -- *not* the data that the
original 'hostaddrs[i]' points to), so the above 'gomp_copy_host2dev'
with 'ephemeral <- false' is still correct, right?

>   tgt_size += len;
> +
> + /* If followed by GOMP_MAP_ATTACH, pointer assign this
> +firstprivate to hostaddrs[i+1], which is assumed to contain a
> +device address.  */
> + if (i + 1 < mapnum
> + && (GOMP_MAP_ATTACH
> + == (typemask & get_kind (short_mapkind, kinds, i+1
> +   {
> + uintptr_t target = (uintptr_t) hostaddrs[i];
> + void *devptr = *(void**) hostaddrs[i+1] + sizes[i+1];
> + gomp_copy_host2dev (devicep, aq, devptr, ,
> + sizeof (void *), false, cbufp);

However, 'h <- ' here points to data in the local frame
('target'), which potentially goes out of scope before an asynchronous
'gomp_copy_host2dev' has completed.  Thus, don't we have to pass here
'ephemeral <- true' instead of 'ephemeral <- false'?  Or, actually
instead of '', pass '[i]', which then again points to
non-ephemeral data?  Is the latter safe to do, or are we potentially
further down the line modifying the data that '[i]' points to?
(I got a bit lost in the use of 'hostaddrs[i]' here.)

> + ++i;
> +   }
>   continue;
> case GOMP_MAP_FIRSTPRIVATE_INT:
> case GOMP_MAP_ZERO_LEN_ARRAY_SECTION:
> @@ -2517,6 +2534,11 @@ copy_firstprivate_data (char *tgt, size_t mapnum, void 
> **hostaddrs,
>  

Re: [Patch] Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

2023-02-25 Thread Thomas Schwinge
Hi Tobias!

On 2023-02-23T17:42:08+0100, Tobias Burnus  wrote:
> (Side note: this patch has been committed to OG12 as 
> http://gcc.gnu.org/g:55a18d47442 )

I see og12 commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
regress existing test cases as follows:

[-PASS:-]{+FAIL:+} gfortran.dg/goacc/finalize-1.f   -O   
scan-tree-dump-times gimple "(?n)#pragma omp target oacc_exit_data 
map\\(delete:MEM <[^>]+> \\[\\(integer\\(kind=.\\)\\[0:\\] \\*\\)_[0-9]+\\] 
\\[len: [^\\]]+\\]\\) map\\(to:del_f_p \\[pointer set, len: [0-9]+\\]\\) 
map\\(alloc:del_f_p\\.data \\[pointer assign, bias: [^\\]]+\\]\\) finalize$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times gimple 
"(?n)#pragma omp target oacc_exit_data map\\(delete:del_f \\[len: [0-9]+\\]\\) 
finalize$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times gimple 
"(?n)#pragma omp target oacc_exit_data map\\(force_from:MEM <[^>]+> 
\\[\\(integer\\(kind=.\\)\\[0:\\] \\*\\)_[0-9]+\\] \\[len: [^\\]]+\\]\\) 
map\\(to:cpo_f_p \\[pointer set, len: [0-9]+\\]\\) map\\(alloc:cpo_f_p\\.data 
\\[pointer assign, bias: [^\\]]+\\]\\) finalize$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times gimple 
"(?n)#pragma omp target oacc_exit_data map\\(force_from:cpo_f \\[len: 
[0-9]+\\]\\) finalize$" 1
@@ -54679,7 +54679,7 @@ PASS: gfortran.dg/goacc/finalize-1.f   -O   
scan-tree-dump-times gimple "(?n)#pr
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times original 
"(?n)#pragma acc exit data map\\(from:\\*\\(integer\\(kind=.\\)\\[0:\\] \\*\\) 
parm\\.1\\.data \\[len: [^\\]]+\\]\\) map\\(to:cpo_f_p \\[pointer set, len: 
[0-9]+\\]\\) map\\(alloc:\\(integer\\(kind=1\\)\\[0:\\] \\* restrict\\) 
cpo_f_p\\.data \\[pointer assign, bias: \\(.*int.*\\) parm\\.1\\.data - 
\\(.*int.*\\) cpo_f_p\\.data\\]\\) finalize;$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times original 
"(?n)#pragma acc exit data map\\(from:cpo_f\\) finalize;$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times original 
"(?n)#pragma acc exit data map\\(from:cpo_r\\);$" 1
[-PASS:-]{+FAIL:+} gfortran.dg/goacc/finalize-1.f   -O   
scan-tree-dump-times original "(?n)#pragma acc exit data 
map\\(release:\\*\\(integer\\(kind=.\\)\\[0:\\] \\*\\) parm\\.0\\.data \\[len: 
[^\\]]+\\]\\) map\\(to:del_f_p \\[pointer set, len: [0-9]+\\]\\) 
map\\(alloc:\\(integer\\(kind=1\\)\\[0:\\] \\* restrict\\) del_f_p\\.data 
\\[pointer assign, bias: \\(.*int.*\\) parm\\.0\\.data - \\(.*int.*\\) 
del_f_p\\.data\\]\\) finalize;$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times original 
"(?n)#pragma acc exit data map\\(release:del_f\\) finalize;$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O   scan-tree-dump-times original 
"(?n)#pragma acc exit data map\\(release:del_r\\);$" 1
PASS: gfortran.dg/goacc/finalize-1.f   -O  (test for excess errors)

[-PASS:-]{+FAIL:+} gfortran.dg/gomp/pr78260-2.f90   -O   
scan-tree-dump-times original "#pragma omp target data 
map\\(tofrom:\\*\\(integer\\(kind=4\\)\\[0:\\] \\* restrict\\) __result->data 
\\[len: D.[0-9]+ \\* 4\\]\\) map\\(to:\\*__result \\[pointer set, len: ..\\]\\) 
map\\(alloc:\\(integer\\(kind=4\\)\\[0:\\] \\* restrict\\) __result->data 
\\[pointer assign, bias: 0\\]\\) map\\(alloc:__result \\[pointer assign, bias: 
0\\]\\)" 1
[-PASS:-]{+FAIL:+} gfortran.dg/gomp/pr78260-2.f90   -O   
scan-tree-dump-times original "#pragma omp target data 
map\\(tofrom:\\*\\(integer\\(kind=4\\)\\[0:\\] \\* restrict\\) arr.data \\[len: 
D.[0-9]+ \\* 4\\]\\) map\\(to:arr \\[pointer set, len: ..\\]\\) 
map\\(alloc:\\(integer\\(kind=4\\)\\[0:\\] \\* restrict\\) arr.data \\[pointer 
assign, bias: 0\\]\\)" 1
PASS: gfortran.dg/gomp/pr78260-2.f90   -O   scan-tree-dump-times original 
"#pragma omp target data map\\(tofrom:\\*__result.0\\) map\\(alloc:__result.0 
\\[pointer assign, bias: 0\\]\\)" 2
PASS: gfortran.dg/gomp/pr78260-2.f90   -O   scan-tree-dump-times original 
"#pragma omp target data map\\(tofrom:__result_f1\\)" 1
PASS: gfortran.dg/gomp/pr78260-2.f90   -O   scan-tree-dump-times original 
"#pragma omp target update to\\(\\*\\(integer\\(kind=4\\)\\[0:\\] \\* 
restrict\\) __result->data \\[len: D.[0-9]+ \\* 4\\]\\)" 1

Do to the scan patterns need adjusting, or is something wrong?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


nvptx: Adjust 'scan-assembler' in 'gfortran.dg/weak-1.f90' (was: Support for NOINLINE attribute)

2023-02-14 Thread Thomas Schwinge
Hi!

On 2023-02-13T18:50:23+0100, Harald Anlauf via Gcc-patches 
 wrote:
> Pushed as:
>
> commit 086a1df4374962787db37c1f0d1bd9beb828f9e3

> On 2/12/23 22:28, Harald Anlauf via Gcc-patches wrote:
>> There is one thing I cannot test, which is the handling of weak symbols
>> on other platforms.  A quick glance at the C testcases suggests that
>> someone with access to either an NVPTX or MingGW target might tell
>> whether that particular target should be excluded.

Indeed nvptx does use a different assembler syntax; I've pushed to
master branch commit 8d8175869ca94c600e64e27b7676787b2a398f6e
"nvptx: Adjust 'scan-assembler' in 'gfortran.dg/weak-1.f90'", see
attached.


And I'm curious, is '!GCC$ ATTRIBUTES weak' meant to be used only for
weak definitions (like in 'gfortran.dg/weak-1.f90'), or also for weak
declarations (which, for example, in the C world then evaluate to
zero-address unless actually defined)?  When I did a quick experiment,
that didn't seem to work?  (But may be my fault, of course.)

And, orthogonally: is '!GCC$ ATTRIBUTES weak' meant to be used only for
subroutines (like in 'gfortran.dg/weak-1.f90') and also functions (I
suppose; test case?), or also for weak "data" in some way (which, for
example, in the C world then evaluates to a zero-address unless actually
defined)?

Could help to at least add a few more test cases, and clarify the
documentation?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8d8175869ca94c600e64e27b7676787b2a398f6e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Feb 2023 10:11:19 +0100
Subject: [PATCH] nvptx: Adjust 'scan-assembler' in 'gfortran.dg/weak-1.f90'

Fix-up for recent commit 086a1df4374962787db37c1f0d1bd9beb828f9e3
"Fortran: Add !GCC$ attributes NOINLINE,NORETURN,WEAK".

	gcc/testsuite/
	* gfortran.dg/weak-1.f90: Adjust 'scan-assembler' for nvptx.
---
 gcc/testsuite/gfortran.dg/weak-1.f90 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/weak-1.f90 b/gcc/testsuite/gfortran.dg/weak-1.f90
index d9aca686775a..9ec1fe74053e 100644
--- a/gcc/testsuite/gfortran.dg/weak-1.f90
+++ b/gcc/testsuite/gfortran.dg/weak-1.f90
@@ -1,6 +1,7 @@
 ! { dg-do compile }
 ! { dg-require-weak "" }
-! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?impl" } }
+! { dg-final { scan-assembler "weak\[^ \t\]*\[ \t\]_?impl" { target { ! nvptx-*-* } } } }
+! { dg-final { scan-assembler-times "\\.weak \\.func impl" 2 { target nvptx-*-* } } }
 subroutine impl
 !GCC$ ATTRIBUTES weak :: impl
 end subroutine
-- 
2.39.1



[og12] 'gfortran.dg/gomp/allocate-4.f90' -> 'libgomp.fortran/allocate-5.f90' (was: [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0))

2023-02-09 Thread Thomas Schwinge
Hi!

On 2022-01-13T14:53:16+, Hafiz Abid Qadeer  wrote:
> [...]

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90

> +  use omp_lib

Pushed to devel/omp/gcc-12 branch
commit 7e1963a4e6ac97b6629c1e9e858ae28487f518cf
"'gfortran.dg/gomp/allocate-4.f90' -> 'libgomp.fortran/allocate-5.f90'",
see attached.

Note that this likewise applies to the current upstream submission:
<https://inbox.sourceware.org/gcc-patches/c00649080f9127a0eeabb45536a2846ffc4c3fa7.1657188329.git@codesourcery.com>
"Add parsing support for allocate directive (OpenMP 5.0)".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7e1963a4e6ac97b6629c1e9e858ae28487f518cf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 30 Jan 2023 18:04:16 +0100
Subject: [PATCH] 'gfortran.dg/gomp/allocate-4.f90' ->
 'libgomp.fortran/allocate-5.f90'

Otherwise, for build-tree testing:

[...]/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90:10:7: Fatal Error: Cannot open module file 'omp_lib.mod' for reading at (1): No such file or directory

..., and thus corresponding FAILs.

(Not renamed to 'libgomp.fortran/allocate-4.f90', as that one already exists.)

Fix-up for og12 commit 491478d12b83e102f72858e8a871a25c951df293
"Add parsing support for allocate directive (OpenMP 5.0)".

	gcc/testsuite/
	* gfortran.dg/gomp/allocate-4.f90: Cut.
	libgomp/
	* testsuite/libgomp.fortran/allocate-5.f90: Paste.
---
 gcc/testsuite/ChangeLog.omp | 2 ++
 libgomp/ChangeLog.omp   | 2 ++
 .../testsuite/libgomp.fortran/allocate-5.f90| 0
 3 files changed, 4 insertions(+)
 rename gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 => libgomp/testsuite/libgomp.fortran/allocate-5.f90 (100%)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 936e7af0945..f0c58e4d26a 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-02-09  Thomas Schwinge  
 
+	* gfortran.dg/gomp/allocate-4.f90: Cut.
+
 	* c-c++-common/gomp/uses_allocators-1.c: Cut.
 	* c-c++-common/gomp/uses_allocators-2.c: Likewise.
 	* c-c++-common/gomp/uses_allocators-3.c: Likewise.
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 0a3d53602da..603a17e4c8d 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-02-09  Thomas Schwinge  
 
+	* testsuite/libgomp.fortran/allocate-5.f90: Paste.
+
 	* testsuite/libgomp.c++/c++.exp (check_effective_target_c)
 	(check_effective_target_c++): New.
 	* testsuite/libgomp.c/c.exp (check_effective_target_c)
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 b/libgomp/testsuite/libgomp.fortran/allocate-5.f90
similarity index 100%
rename from gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
rename to libgomp/testsuite/libgomp.fortran/allocate-5.f90
-- 
2.25.1



[og12] 'c-c++-common/gomp/alloc-pinned-1.c' -> 'libgomp.c-c++-common/alloc-pinned-1.c' (was: [PATCH 5/5] openmp: -foffload-memory=pinned)

2023-02-09 Thread Thomas Schwinge
Hi!

On 2022-03-08T11:30:59+, Hafiz Abid Qadeer  wrote:
> From: Andrew Stubbs 
>
> [...]

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */

Pushed to devel/omp/gcc-12 branch
commit 9c0ffa3776a135a69697253a0bd75ebf9b9d0150
"'c-c++-common/gomp/alloc-pinned-1.c' -> 
'libgomp.c-c++-common/alloc-pinned-1.c'",
see attached.

Note that this likewise applies to the current upstream submission:

"openmp: -foffload-memory=pinned".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9c0ffa3776a135a69697253a0bd75ebf9b9d0150 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 30 Jan 2023 17:46:29 +0100
Subject: [PATCH] 'c-c++-common/gomp/alloc-pinned-1.c' ->
 'libgomp.c-c++-common/alloc-pinned-1.c'

Otherwise, for build-tree testing:

xgcc: fatal error: cannot read spec file 'libgomp.spec': No such file or directory

..., and thus corresponding FAILs, UNRESOLVEDs.

Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned".

	gcc/testsuite/
	* c-c++-common/gomp/alloc-pinned-1.c: Cut.
	libgomp/
	* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: Paste.
---
 gcc/testsuite/ChangeLog.omp   | 2 ++
 libgomp/ChangeLog.omp | 4 
 .../testsuite/libgomp.c-c++-common}/alloc-pinned-1.c  | 0
 3 files changed, 6 insertions(+)
 rename {gcc/testsuite/c-c++-common/gomp => libgomp/testsuite/libgomp.c-c++-common}/alloc-pinned-1.c (100%)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 42769c7dae5..9f9d5a10ac3 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-02-09  Thomas Schwinge  
 
+	* c-c++-common/gomp/alloc-pinned-1.c: Cut.
+
 	* gfortran.dg/gomp/allocate-4.f90: Fix 'omp_allocator_handle_kind'
 	example.
 
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index d319d43ceb0..39165173884 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,7 @@
+2023-02-09  Thomas Schwinge  
+
+	* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: Paste.
+
 2023-02-08  Tobias Burnus  
 
 	Backported from master:
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
similarity index 100%
rename from gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
rename to libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
-- 
2.25.1



[og12] Fix 'omp_allocator_handle_kind' example in 'gfortran.dg/gomp/allocate-4.f90' (was: [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0).)

2023-02-01 Thread Thomas Schwinge
t; +
> +end function outer
> +
> +subroutine bar(s)
> +  use omp_lib
> +  use test
> +  integer  :: s
> +  integer, save, allocatable :: svar1
> +  integer, save, allocatable :: svar2
> +  integer, save, allocatable :: svar3
> +
> +  type (omp_alloctrait) :: traits(3)
> +  integer (omp_allocator_handle_kind) :: a
> +
> +  traits = [omp_alloctrait (omp_atk_alignment, 64), &
> +omp_alloctrait (omp_atk_fallback, omp_atv_null_fb), &
> +omp_alloctrait (omp_atk_pool_size, 8192)]
> +  a = omp_init_allocator (omp_default_mem_space, 3, traits)
> +  if (a == omp_null_allocator) stop 1
> +
> +  !$omp allocate (mvar1) allocator(a) ! { dg-error "'mvar1' should use 
> predefined allocator at .1." }
> +  allocate (mvar1)
> +
> +  !$omp allocate (mvar2) ! { dg-error "'mvar2' should use predefined 
> allocator at .1." }
> +  allocate (mvar2)
> +
> +  !$omp allocate (mvar3) allocator(omp_low_lat_mem_alloc)
> +  allocate (mvar3)
> +
> +  !$omp allocate (svar1)  allocator(a) ! { dg-error "'svar1' should use 
> predefined allocator at .1." }
> +  allocate (svar1)
> +
> +  !$omp allocate (svar2) ! { dg-error "'svar2' should use predefined 
> allocator at .1." }
> +  allocate (svar2)
> +
> +  !$omp allocate (svar3) allocator(omp_low_lat_mem_alloc)
> +  allocate (svar3)
> +end subroutine
> +
> diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 
> b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
> new file mode 100644
> index 000..761b6dede28
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
> @@ -0,0 +1,73 @@
> +! { dg-do compile }
> +
> +module omp_lib_kinds
> +  use iso_c_binding, only: c_int, c_intptr_t
> +  implicit none
> +  private :: c_int, c_intptr_t
> +  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
> +
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_null_allocator = 0
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_default_mem_alloc = 1
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_large_cap_mem_alloc = 2
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_const_mem_alloc = 3
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_high_bw_mem_alloc = 4
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_low_lat_mem_alloc = 5
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_cgroup_mem_alloc = 6
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_pteam_mem_alloc = 7
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_thread_mem_alloc = 8
> +end module
> +
> +subroutine foo(x, y)
> +  use omp_lib_kinds
> +  implicit none
> +  integer  :: x
> +  integer  :: y
> +
> +  integer, allocatable :: var1(:)
> +  integer, allocatable :: var2(:)
> +  integer, allocatable :: var3(:)
> +  integer, allocatable :: var4(:,:)
> +  integer, allocatable :: var5(:)
> +  integer, allocatable :: var6(:)
> +  integer, allocatable :: var7(:)
> +  integer, allocatable :: var8(:)
> +  integer, allocatable :: var9(:)
> +  integer, allocatable :: var10(:)
> +  integer, allocatable :: var11(:)
> +  integer, allocatable :: var12(:)
> +
> +  !$omp allocate (var1) allocator(omp_default_mem_alloc)
> +  allocate (var1(x))
> +
> +  !$omp allocate (var2)
> +  allocate (var2(x))
> +
> +  !$omp allocate (var3, var4) allocator(omp_large_cap_mem_alloc)
> +  allocate (var3(x),var4(x,y))
> +
> +  !$omp allocate()
> +  allocate (var5(x))
> +
> +  !$omp allocate
> +  allocate (var6(x))
> +
> +  !$omp allocate () allocator(omp_default_mem_alloc)
> +  allocate (var7(x))
> +
> +  !$omp allocate allocator(omp_default_mem_alloc)
> +  allocate (var8(x))
> +
> +  !$omp allocate (var9) allocator(omp_default_mem_alloc)
> +  !$omp allocate (var10) allocator(omp_large_cap_mem_alloc)
> +  allocate (var9(x), var10(x))
> +
> +end subroutine


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e07fb2a36377a6504dda088f0a1c5185ff51d652 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 1 Feb 2023 12:30:28 +0100
Subject: [PATCH] Fix 'omp_allocator_handle_kind' example in
 'gfortran.dg/gomp/allocate-4.f90'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I've noticed that while 'gfortran.dg/gomp/allocate-

Re: Buildbot (Sourceware): gcc - failed configure (failure) (master)

2023-01-31 Thread Thomas Schwinge
Hi!

On 2023-01-30T14:50:08-0800, Steve Kargl via Fortran  
wrote:
> Does the skull and crossbones convey anymore info than the rest of
> the subject line
>
> Buildbot (Sourceware): gcc - failed configure (failure) (master)

They convey as much additional information as does (automated) colorful
syntax highlighting, or (manual) source code line indentation: "none" to
some, "a lot" to others.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Update 'libgomp/libgomp.texi' for 'nvptx, libgfortran: Switch out of "minimal" mode' (was: nvptx, libgfortran: Switch out of "minimal" mode)

2023-01-24 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:16:00+0100, I wrote:
> On 2023-01-20T22:04:02+0100, I wrote:
>> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
>> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
>> configuration of libgfortran.
>
> This is achieved by 'nvptx, libgfortran: Switch out of "minimal" mode',
> see attached, again based on WIP work by Andrew Stubbs.  This I've just
> pushed to devel/omp/gcc-12 branch in
> commit c7734c6fbb5513b4da6306de7bc85de9b8547988, and would like to push
> to master branch once other pending GCC patches have been accepted.
>
>
> The OpenACC XFAILs: "[...] overflows the stack for nvptx offloading"
> are unresolved at this point; see the discussion around
> "Handling of large stack objects in GPU code generation -- maybe transform 
> into heap allocation?",
> and my "nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold'"
> experimenting.  (The latter works to some extent, but also has other
> issues that I shall detail at some later point in time.)

I had a note from Tobias to "update the the last but one bullet point at
https://gcc.gnu.org/onlinedocs/libgomp/nvptx.html;.  Thus pushed to
devel/omp/gcc-12 branch commit 8c29332e98ca4669a059ebc0d90903b409ae049f
"Update 'libgomp/libgomp.texi' for 'nvptx, libgfortran: Switch out of "minimal" 
mode'",
see attached.  Please consider that one 'fixup'ed into the GCC master
branch submission.


Grüße
 Thomas


> From c7734c6fbb5513b4da6306de7bc85de9b8547988 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Wed, 21 Sep 2022 18:58:34 +0200
> Subject: [PATCH] nvptx, libgfortran: Switch out of "minimal" mode
>
> ..., in order to enable (portions of) Fortran I/O, for example.
>
> libgfortran/ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.
>
> libgomp/ChangeLog:
>
>   * testsuite/libgomp.fortran/target-print-1.f90: Adjust.
>   * testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
>   * testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
>   * testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Remove.
>   * testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
>   * testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
>
> Co-authored-by: Andrew Stubbs 
> ---
>  libgfortran/ChangeLog.omp   |  6 ++
>  libgfortran/configure   | 17 ++---
>  libgfortran/configure.ac| 17 ++---
>  libgomp/ChangeLog.omp   |  7 +++
>  .../libgomp.fortran/target-print-1-nvptx.f90| 11 ---
>  .../libgomp.fortran/target-print-1.f90  |  3 ---
>  .../libgomp.oacc-fortran/error_stop-2.f |  4 +++-
>  .../libgomp.oacc-fortran/print-1-nvptx.f90  | 11 ---
>  .../testsuite/libgomp.oacc-fortran/print-1.f90  |  5 ++---
>  libgomp/testsuite/libgomp.oacc-fortran/stop-2.f |  4 +++-
>  10 files changed, 33 insertions(+), 52 deletions(-)
>  delete mode 100644 libgomp/testsuite/libgomp.fortran/target-print-1-nvptx.f90
>  delete mode 100644 libgomp/testsuite/libgomp.oacc-fortran/print-1-nvptx.f90
>
> diff --git a/libgfortran/ChangeLog.omp b/libgfortran/ChangeLog.omp
> index b08c264daf9..925575e65fa 100644
> --- a/libgfortran/ChangeLog.omp
> +++ b/libgfortran/ChangeLog.omp
> @@ -1,3 +1,9 @@
> +2023-01-20  Thomas Schwinge  
> + Andrew Stubbs  
> +
> + * configure: Regenerate.
> + * configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.
> +
>  2023-01-20  Thomas Schwinge  
>
>   PR target/85463
> diff --git a/libgfortran/configure b/libgfortran/configure
> index ae64dca3114..3e5c931d4ad 100755
> --- a/libgfortran/configure
> +++ b/libgfortran/configure
> @@ -6230,17 +6230,12 @@ else
>  fi
>
>
> -# For GPU offloading, not everything in libfortran can be supported.
> -# Currently, the only target that has this problem is nvptx.  The
> -# following is a (partial) list of features that are unsupportable on
> -# this particular target:
> -# * Constructors
> -# * alloca
> -# * C library support for I/O, with printf as the one notable exception
> -# * C library support for other features such as signal, environment
> -#   variables, time functions
> -
> - if test "x${target_cpu}" = xnvptx; then
> +# "Minimal" mode is for targets that cannot (yet) support all features of
> +# libgfortran.  It avoids the need for working constructors, alloca, and C
> +# library support for I/O, signals, environment variables, time functions, 
> etc.
> +# At present there are no targets that require this 

nvptx, libgfortran: Switch out of "minimal" mode

2023-01-20 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:04:02+0100, I wrote:
> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
> configuration of libgfortran.

This is achieved by 'nvptx, libgfortran: Switch out of "minimal" mode',
see attached, again based on WIP work by Andrew Stubbs.  This I've just
pushed to devel/omp/gcc-12 branch in
commit c7734c6fbb5513b4da6306de7bc85de9b8547988, and would like to push
to master branch once other pending GCC patches have been accepted.


The OpenACC XFAILs: "[...] overflows the stack for nvptx offloading"
are unresolved at this point; see the discussion around
"Handling of large stack objects in GPU code generation -- maybe transform into 
heap allocation?",
and my "nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold'"
experimenting.  (The latter works to some extent, but also has other
issues that I shall detail at some later point in time.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c7734c6fbb5513b4da6306de7bc85de9b8547988 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 21 Sep 2022 18:58:34 +0200
Subject: [PATCH] nvptx, libgfortran: Switch out of "minimal" mode

..., in order to enable (portions of) Fortran I/O, for example.

libgfortran/ChangeLog:

	* configure: Regenerate.
	* configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
	* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
	* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Remove.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.

Co-authored-by: Andrew Stubbs 
---
 libgfortran/ChangeLog.omp   |  6 ++
 libgfortran/configure   | 17 ++---
 libgfortran/configure.ac| 17 ++---
 libgomp/ChangeLog.omp   |  7 +++
 .../libgomp.fortran/target-print-1-nvptx.f90| 11 ---
 .../libgomp.fortran/target-print-1.f90  |  3 ---
 .../libgomp.oacc-fortran/error_stop-2.f |  4 +++-
 .../libgomp.oacc-fortran/print-1-nvptx.f90  | 11 ---
 .../testsuite/libgomp.oacc-fortran/print-1.f90  |  5 ++---
 libgomp/testsuite/libgomp.oacc-fortran/stop-2.f |  4 +++-
 10 files changed, 33 insertions(+), 52 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.fortran/target-print-1-nvptx.f90
 delete mode 100644 libgomp/testsuite/libgomp.oacc-fortran/print-1-nvptx.f90

diff --git a/libgfortran/ChangeLog.omp b/libgfortran/ChangeLog.omp
index b08c264daf9..925575e65fa 100644
--- a/libgfortran/ChangeLog.omp
+++ b/libgfortran/ChangeLog.omp
@@ -1,3 +1,9 @@
+2023-01-20  Thomas Schwinge  
+	Andrew Stubbs  
+
+	* configure: Regenerate.
+	* configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.
+
 2023-01-20  Thomas Schwinge  
 
 	PR target/85463
diff --git a/libgfortran/configure b/libgfortran/configure
index ae64dca3114..3e5c931d4ad 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6230,17 +6230,12 @@ else
 fi
 
 
-# For GPU offloading, not everything in libfortran can be supported.
-# Currently, the only target that has this problem is nvptx.  The
-# following is a (partial) list of features that are unsupportable on
-# this particular target:
-# * Constructors
-# * alloca
-# * C library support for I/O, with printf as the one notable exception
-# * C library support for other features such as signal, environment
-#   variables, time functions
-
- if test "x${target_cpu}" = xnvptx; then
+# "Minimal" mode is for targets that cannot (yet) support all features of
+# libgfortran.  It avoids the need for working constructors, alloca, and C
+# library support for I/O, signals, environment variables, time functions, etc.
+# At present there are no targets that require this mode.
+
+ if false; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 97cc490cb5e..e5552949cc6 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -222,17 +222,12 @@ AM_CONDITIONAL(LIBGFOR_USE_SYMVER, [test "x$gfortran_use_symver" != xno])
 AM_CONDITIONAL(LIBGFOR_USE_SYMVER_GNU, [test "x$gfortran_use_symver" = xgnu])
 AM_CONDITIONAL(LIBGFOR_USE_SYMVER_SUN, [test "x$gfortran_use_symver" = xsun])
 
-# For GPU offloading, not everything in libfortran can be supported.
-# Currently, the only target that has this problem is nvptx.  The
-# following is a (partial) list of fe

nvptx, libgcc: Stub unwinding implementation

2023-01-20 Thread Thomas Schwinge
Hi!

We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
configuration of libgfortran.  One prerequisite patch, based on WIP work
by Andrew Stubbs, is: "nvptx, libgcc: Stub unwinding implementation", see
attached.  This I've just pushed to devel/omp/gcc-12 branch in
commit 26d3146736218ccfdaba4da1edf969bc190d, and would like to push
to master branch once other pending GCC patches have been accepted.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 26d3146736218ccfdaba4da1edf969bc190d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 21 Sep 2022 18:58:34 +0200
Subject: [PATCH] nvptx, libgcc: Stub unwinding implementation

Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
	* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs 
---
 libgcc/ChangeLog.omp   |  6 +
 libgcc/config/nvptx/t-nvptx|  3 ++-
 libgcc/config/nvptx/unwind-nvptx.c | 36 ++
 3 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/nvptx/unwind-nvptx.c

diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index 2e7bf5cc029..c46f49bf5b7 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,3 +1,9 @@
+2023-01-20  Thomas Schwinge  
+	Andrew Stubbs  
+
+	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
+	* config/nvptx/unwind-nvptx.c: New file.
+
 2023-01-20  Thomas Schwinge  
 
 	* config/nvptx/crtstuff.c ["mgomp"]
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index 9a0454c3a4d..1845a38a35e 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,6 +1,7 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/mgomp.c \
-	$(srcdir)/config/nvptx/atomic.c
+	$(srcdir)/config/nvptx/atomic.c \
+	$(srcdir)/config/nvptx/unwind-nvptx.c
 
 LIB2ADDEH=
 LIB2FUNCS_EXCLUDE=
diff --git a/libgcc/config/nvptx/unwind-nvptx.c b/libgcc/config/nvptx/unwind-nvptx.c
new file mode 100644
index 000..c657b2af6f3
--- /dev/null
+++ b/libgcc/config/nvptx/unwind-nvptx.c
@@ -0,0 +1,36 @@
+/* Stub unwinding implementation.
+
+   Copyright (C) 2019-2023 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "unwind.h"
+
+_Unwind_Reason_Code
+_Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument)
+{
+  return 0;
+}
+
+_Unwind_Ptr
+_Unwind_GetIPInfo (struct _Unwind_Context *c, int *ip_before_insn)
+{
+  return 0;
+}
-- 
2.25.1



Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"

2023-01-20 Thread Thomas Schwinge
Hi!

Re the newlib commit 05a2d7a8b3277b469e7cb121115bba398adc8559
"nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"
that I've just pushes to newlib main branch:

On 2023-01-19T23:00:05+0100, I wrote:
> This is still not properly resolving <https://gcc.gnu.org/PR85463>
> '[nvptx] "exit" in offloaded region doesn't terminate process', but is
> one step into that direction, and allows for simplifying some GCC code.

> --- a/newlib/libc/machine/nvptx/_exit.c
> +++ b/newlib/libc/machine/nvptx/_exit.c

> @@ -26,7 +27,15 @@ void __attribute__((noreturn))
>  _exit (int status)
>  {
>if (__exitval_ptr)
> -*__exitval_ptr = status;
> -  for (;;)
> -asm ("exit;" ::: "memory");
> +{
> +  *__exitval_ptr = status;
> +  for (;;)
> +   asm ("exit;" ::: "memory");
> +}
> +  else /* offloading */
> +{
> +  /* Map to 'abort'; see <https://gcc.gnu.org/PR85463>
> +'[nvptx] "exit" in offloaded region doesn't terminate process'.  */
> +  abort ();
> +}
>  }

That has put "the PR85463 stuff" into the one central place, and allows
for simplifying GCC as per the attached
'Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' 
[GCC PR85463]"',
which I've just pushed to GCC devel/omp/gcc-12 branch in
commit 094b379f461bb4b635327cde26eabc0966159fec, and intend to push to
GCC master branch once the latter depends on updated newlib for other
(functional) reasons.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 094b379f461bb4b635327cde26eabc0966159fec Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Jan 2023 20:25:45 +0100
Subject: [PATCH] Clean up after newlib "nvptx: In offloading execution, map
 '_exit' to 'abort' [GCC PR85463]"

	PR target/85463
	libgfortran/
	* runtime/minimal.c [__nvptx__] (exit): Don't override.
	libgomp/
	* config/nvptx/error.c (exit): Don't override.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
---
 libgfortran/ChangeLog.omp   |  4 
 libgfortran/runtime/minimal.c   |  8 
 libgomp/ChangeLog.omp   |  9 +
 libgomp/config/nvptx/error.c|  7 ---
 .../testsuite/libgomp.oacc-fortran/error_stop-1.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-2.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-3.f   |  8 +---
 libgomp/testsuite/libgomp.oacc-fortran/stop-1.f | 13 +
 libgomp/testsuite/libgomp.oacc-fortran/stop-2.f |  6 +-
 libgomp/testsuite/libgomp.oacc-fortran/stop-3.f | 12 
 10 files changed, 50 insertions(+), 33 deletions(-)
 create mode 100644 libgfortran/ChangeLog.omp

diff --git a/libgfortran/ChangeLog.omp b/libgfortran/ChangeLog.omp
new file mode 100644
index 000..b08c264daf9
--- /dev/null
+++ b/libgfortran/ChangeLog.omp
@@ -0,0 +1,4 @@
+2023-01-20  Thomas Schwinge  
+
+	PR target/85463
+	* runtime/minimal.c [__nvptx__] (exit): Don't override.
diff --git a/libgfortran/runtime/minimal.c b/libgfortran/runtime/minimal.c
index 326ff822ca7..5af2bada2f6 100644
--- a/libgfortran/runtime/minimal.c
+++ b/libgfortran/runtime/minimal.c
@@ -31,14 +31,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 
-#if __nvptx__
-/* Map "exit" to "abort"; see PR85463 '[nvptx] "exit" in offloaded region
-   doesn't terminate process'.  */
-# undef exit
-# define exit(status) do { (void) (status); abort (); } while (0)
-#endif
-
-
 #if __nvptx__
 /* 'printf' is all we have.  */
 # undef estr_vprintf
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 134d450f44a..33aa4b01350 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,14 @@
 2023-01-20  Thomas Schwinge  
 
+	PR target/85463
+	* config/nvptx/error.c (exit): Don't override.
+	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
+	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
+
 	* testsuite/libgomp.c/simd-math-1.c: Fix configura

[PING] nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2023-01-11 Thread Thomas Schwinge
Hi!

Ping -- the '-mframe-malloc-threshold' idea, at least.

Note that while this issue originally did pop up for Fortran I/O, it's
likewise relevant for other functions that maintain big frames, for
example in newlib:

libc/string/libc_a-memmem.o:.local .align 16 .b8 %frame_ar[2064];
libc/string/libc_a-strcasestr.o:.local .align 16 .b8 %frame_ar[2064];
libc/string/libc_a-strstr.o:.local .align 16 .b8 %frame_ar[2064];
libm/math/libm_a-k_rem_pio2.o:.local .align 16 .b8 %frame_ar[560];

Therefore a generic solution (or, workaround if you'd like) does seem
appropriate.


Grüße
 Thomas


On 2022-12-23T15:08:06+0100, I wrote:
> Hi!
>
> On 2022-11-11T15:35:44+0100, Richard Biener via Fortran  
> wrote:
>> On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge  
>> wrote:
>>> For example, for Fortran code like:
>>>
>>> write (*,*) "Hello world"
>>>
>>> ..., 'gfortran' creates:
>>>
>>> struct __st_parameter_dt dt_parm.0;
>>>
>>> try
>>>   {
>>> dt_parm.0.common.filename = 
>>> &"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 
>>> sz: 1};
>>> dt_parm.0.common.line = 29;
>>> dt_parm.0.common.flags = 128;
>>> dt_parm.0.common.unit = 6;
>>> _gfortran_st_write (_parm.0);
>>> _gfortran_transfer_character_write (_parm.0, &"Hello 
>>> world"[1]{lb: 1 sz: 1}, 11);
>>> _gfortran_st_write_done (_parm.0);
>>>   }
>>> finally
>>>   {
>>> dt_parm.0 = {CLOBBER(eol)};
>>>   }
>>>
>>> The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
>>> really! -- there's a lot of state in Fortran I/O apparently).  That's a
>>> problem for GPU execution -- here: OpenACC/nvptx -- where typically you
>>> have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
>>> GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
>>> "Use custom stacks instead of local memory for automatic storage".)
>>>
>>> Now, the Nvidia Driver tries to accomodate for such largish stack usage,
>>> and dynamically increases the per-thread stack as necessary (thereby
>>> potentially reducing parallelism) -- if it manages to understand the call
>>> graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
>>> to disprove existance of recursion is the common problem, as I've read.
>>> At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:
>>>
>>> warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
>>> statically determined
>>>
>>> That's still not an actual problem: if the GPU kernel's stack usage still
>>> fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
>>> I/O handling, there is another such 'dt_parm' put onto the stack, the
>>> stack then overflows; device-side SIGSEGV.
>>>
>>> (There is, by the way, some similar analysis by Tom de Vries in
>>> <https://gcc.gnu.org/PR85519> "[nvptx, openacc, openmp, testsuite]
>>> Recursive tests may fail due to thread stack limit".)
>>>
>>> Of course, you shouldn't really be doing I/O in GPU kernels, but people
>>> do like their occasional "'printf' debugging", so we ought to make that
>>> work (... without pessimizing any "normal" code).
>>>
>>> I assume that generally reducing the size of 'dt_parm' etc. is out of
>>> scope.
>>>
>>> There is a way to manually set a per-thread stack size, but it's not
>>> obvious which size to set: that sizes needs to work for the whole GPU
>>> kernel, and should be as low as possible (to maximize parallelism).
>>> I assume that even if GCC did an accurate call graph analysis of the GPU
>>> kernel's maximum stack usage, that still wouldn't help: that's before the
>>> PTX JIT does its own code transformations, including stack spilling.
>>>
>>> There exists a 'CU_JIT_LTO' flag to "Enable link-time optimization
>>> (-dlto) for device code".  This might help, assuming that it manages to
>>> simplify the libgfortran I/O code such that the PTX JIT then understands
>>> the call graph.  But: that's available only starting with recent
>>> CUDA 11.4, so not a general solution -- if it works at all, which I've
>>> not tested.
>>>
>>> Similarly, we could enable GCC's LTO for device code generation

nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2022-12-23 Thread Thomas Schwinge
Hi!

On 2022-11-11T15:35:44+0100, Richard Biener via Fortran  
wrote:
> On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge  
> wrote:
>> For example, for Fortran code like:
>>
>> write (*,*) "Hello world"
>>
>> ..., 'gfortran' creates:
>>
>> struct __st_parameter_dt dt_parm.0;
>>
>> try
>>   {
>> dt_parm.0.common.filename = 
>> &"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 
>> sz: 1};
>> dt_parm.0.common.line = 29;
>> dt_parm.0.common.flags = 128;
>> dt_parm.0.common.unit = 6;
>> _gfortran_st_write (_parm.0);
>> _gfortran_transfer_character_write (_parm.0, &"Hello 
>> world"[1]{lb: 1 sz: 1}, 11);
>> _gfortran_st_write_done (_parm.0);
>>   }
>> finally
>>   {
>> dt_parm.0 = {CLOBBER(eol)};
>>   }
>>
>> The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
>> really! -- there's a lot of state in Fortran I/O apparently).  That's a
>> problem for GPU execution -- here: OpenACC/nvptx -- where typically you
>> have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
>> GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
>> "Use custom stacks instead of local memory for automatic storage".)
>>
>> Now, the Nvidia Driver tries to accomodate for such largish stack usage,
>> and dynamically increases the per-thread stack as necessary (thereby
>> potentially reducing parallelism) -- if it manages to understand the call
>> graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
>> to disprove existance of recursion is the common problem, as I've read.
>> At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:
>>
>> warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
>> statically determined
>>
>> That's still not an actual problem: if the GPU kernel's stack usage still
>> fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
>> I/O handling, there is another such 'dt_parm' put onto the stack, the
>> stack then overflows; device-side SIGSEGV.
>>
>> (There is, by the way, some similar analysis by Tom de Vries in
>> <https://gcc.gnu.org/PR85519> "[nvptx, openacc, openmp, testsuite]
>> Recursive tests may fail due to thread stack limit".)
>>
>> Of course, you shouldn't really be doing I/O in GPU kernels, but people
>> do like their occasional "'printf' debugging", so we ought to make that
>> work (... without pessimizing any "normal" code).
>>
>> I assume that generally reducing the size of 'dt_parm' etc. is out of
>> scope.
>>
>> There is a way to manually set a per-thread stack size, but it's not
>> obvious which size to set: that sizes needs to work for the whole GPU
>> kernel, and should be as low as possible (to maximize parallelism).
>> I assume that even if GCC did an accurate call graph analysis of the GPU
>> kernel's maximum stack usage, that still wouldn't help: that's before the
>> PTX JIT does its own code transformations, including stack spilling.
>>
>> There exists a 'CU_JIT_LTO' flag to "Enable link-time optimization
>> (-dlto) for device code".  This might help, assuming that it manages to
>> simplify the libgfortran I/O code such that the PTX JIT then understands
>> the call graph.  But: that's available only starting with recent
>> CUDA 11.4, so not a general solution -- if it works at all, which I've
>> not tested.
>>
>> Similarly, we could enable GCC's LTO for device code generation -- but
>> that's a big project, out of scope at this time.  And again, we don't
>> know if that at all helps this case.
>>
>> I see a few options:
>>
>> (a) Figure out what it is in the libgfortran I/O implementation that
>> causes "Stack size [...] cannot be statically determined", and re-work
>> that code to avoid that, or even disable certain things for nvptx, if
>> feasible.

> Shrink st_parameter_dt (it's part of the ABI though, kind of).  Lots of the
> bloat is from things that are unused for simpler I/O cases (so some
> "inheritance" could help), and lots of the bloat is from using
> string/length pairs using char * + size_t for what looks like could be
> encoded a lot more efficiently.
>
> There's probably not much low-hanging fruit.

(Similarly comments in Janne's email.)


Well, as had to be expected, libgfortran I/O is really just one example,
but

Re: [Patch] OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays

2022-11-12 Thread Thomas Schwinge
Hi Tobias!

On 2022-05-13T19:44:51+0200, Jakub Jelinek via Fortran  
wrote:
> On Fri, May 13, 2022 at 07:21:02PM +0200, Tobias Burnus wrote:
>> gcc/fortran/ChangeLog:
>>
>>  * trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
>>  array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
>>  GOMP_MAP_POINTER for the pointer attachment.
>>
>> libgomp/ChangeLog:
>>
>>  * testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
>
> Not 100% sure if we want to add such a testcase into the testsuite given
> that it is not valid OpenMP, but perhaps it is ok as we are testing a QoI.

For non-offloading x86_64-pc-linux-gnu '-m32', I'm occasionally (but very
rarely!) seeing this test case FAIL its execution test.  Similar can also
be seen on occasional reports via ,
.


Grüße
 Thomas


'libgomp.fortran/target-nowait-array-section.f90':

| ! Runs the the target region asynchrolously and checks for it
| !
| ! Note that  map(alloc: work(:, i)) + nowait  should be safe
| ! given that a nondescriptor array is used. However, it still
| ! violates a map clause restriction, added in OpenMP 5.1 [354:10-13].
|
| PROGRAM test_target_teams_distribute_nowait
|   USE ISO_Fortran_env, only: INT64
|   implicit none
| INTEGER, parameter :: N = 1024, N_TASKS = 16
| INTEGER :: i, j, k, my_ticket
| INTEGER :: order(n_tasks)
| INTEGER(INT64) :: work(n, n_tasks)
| INTEGER :: ticket
| logical :: async
|
| ticket = 0
|
| !$omp target enter data map(to: ticket, order)
|
| !$omp parallel do num_threads(n_tasks)
| DO i = 1, n_tasks
|!$omp target map(alloc: work(:, i), ticket) private(my_ticket) nowait
|!!$omp target teams distribute map(alloc: work(:, i), ticket) 
private(my_ticket) nowait
|DO j = 1, n
|   ! Waste cyles
| !  work(j, i) = 0
| !  DO k = 1, n*(n_tasks - i)
| ! work(j, i) = work(j, i) + i*j*k
| !  END DO
|   my_ticket = 0
|   !$omp atomic capture
|   ticket = ticket + 1
|   my_ticket = ticket
|   !$omp end atomic
|   !$omp atomic write
|   order(i) = my_ticket
|END DO
|!$omp end target !teams distribute
| END DO
| !$omp end parallel do
|
| !$omp target exit data map(from:ticket, order)
|
| IF (ticket .ne. n_tasks*n) stop 1
| if (maxval(order) /= n_tasks*n) stop 2
| ! order(i) == n*i if synchronous and between n and n*n_tasks if run 
concurrently
| do i = 1, n_tasks
|   if (order(i) < n .or. order(i) > n*n_tasks) stop 3
| end do
| async = .false.
| do i = 1, n_tasks
|   if (order(i) /= n*i) async = .true.
| end do
| if (.not. async) stop 4 ! Did not run asynchronously
| end
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?

2022-11-11 Thread Thomas Schwinge
Hi!

For example, for Fortran code like:

write (*,*) "Hello world"

..., 'gfortran' creates:

struct __st_parameter_dt dt_parm.0;

try
  {
dt_parm.0.common.filename = 
&"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 sz: 
1};
dt_parm.0.common.line = 29;
dt_parm.0.common.flags = 128;
dt_parm.0.common.unit = 6;
_gfortran_st_write (_parm.0);
_gfortran_transfer_character_write (_parm.0, &"Hello world"[1]{lb: 1 
sz: 1}, 11);
_gfortran_st_write_done (_parm.0);
  }
finally
  {
dt_parm.0 = {CLOBBER(eol)};
  }

The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
really! -- there's a lot of state in Fortran I/O apparently).  That's a
problem for GPU execution -- here: OpenACC/nvptx -- where typically you
have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
"Use custom stacks instead of local memory for automatic storage".)

Now, the Nvidia Driver tries to accomodate for such largish stack usage,
and dynamically increases the per-thread stack as necessary (thereby
potentially reducing parallelism) -- if it manages to understand the call
graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
to disprove existance of recursion is the common problem, as I've read.
At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:

warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
statically determined

That's still not an actual problem: if the GPU kernel's stack usage still
fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
I/O handling, there is another such 'dt_parm' put onto the stack, the
stack then overflows; device-side SIGSEGV.

(There is, by the way, some similar analysis by Tom de Vries in
 "[nvptx, openacc, openmp, testsuite]
Recursive tests may fail due to thread stack limit".)

Of course, you shouldn't really be doing I/O in GPU kernels, but people
do like their occasional "'printf' debugging", so we ought to make that
work (... without pessimizing any "normal" code).

I assume that generally reducing the size of 'dt_parm' etc. is out of
scope.

There is a way to manually set a per-thread stack size, but it's not
obvious which size to set: that sizes needs to work for the whole GPU
kernel, and should be as low as possible (to maximize parallelism).
I assume that even if GCC did an accurate call graph analysis of the GPU
kernel's maximum stack usage, that still wouldn't help: that's before the
PTX JIT does its own code transformations, including stack spilling.

There exists a 'CU_JIT_LTO' flag to "Enable link-time optimization
(-dlto) for device code".  This might help, assuming that it manages to
simplify the libgfortran I/O code such that the PTX JIT then understands
the call graph.  But: that's available only starting with recent
CUDA 11.4, so not a general solution -- if it works at all, which I've
not tested.

Similarly, we could enable GCC's LTO for device code generation -- but
that's a big project, out of scope at this time.  And again, we don't
know if that at all helps this case.

I see a few options:

(a) Figure out what it is in the libgfortran I/O implementation that
causes "Stack size [...] cannot be statically determined", and re-work
that code to avoid that, or even disable certain things for nvptx, if
feasible.

(b) Also for GCC/OpenACC/nvptx use the GCC/OpenMP/nvptx '-msoft-stack'.
I don't really want to do that however: it does introduce a bit of
complexity in all the generated device code and run-time overhead that we
generally would like to avoid.

(c) I'm contemplating a tweak/compiler pass for transforming such large
stack objects into heap allocation (during nvptx offloading compilation).
'malloc'/'free' do exist; they're slow, but that's not a problem for the
code paths this is to affect.  (Might also add some compile-time
diagnostic, of course.)  Could maybe even limit this to only be used
during libgfortran compilation?  This is then conceptually a bit similar
to (b), but localized to relevant parts only.  Has such a thing been done
before in GCC, that I could build upon?

Any other clever ideas?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[newlib] Generally make all 'long double complex' methods available in

2022-11-08 Thread Thomas Schwinge
..., not just '#if defined(__CYGWIN__)'.  (Exception: 'clog10l' which currently
indeed is for Cygwin only.)

This completes 2017-07-05 commit be3ca3947402827aa52709e677369bc7ad30aa1d
"Fixed warnings for some long double complex methods" after Aditya Upadhyay's
work on importing "Long double complex methods" from NetBSD.

For example, this changes GCC/nvptx libgfortran 'configure' output as follows:

[...]
checking for ccosf... yes
checking for ccos... yes
checking for ccosl... [-no-]{+yes+}
[...]

..., and correspondingly GCC/nvptx 'nvptx-none/libgfortran/config.h' as
follows:

[...]
 /* Define to 1 if you have the `ccosl' function. */
-/* #undef HAVE_CCOSL */
+#define HAVE_CCOSL 1
[...]

Similarly for 'ccoshl', 'cexpl', 'cpowl', 'csinl', 'csinhl', 'ctanl', 'ctanhl',
'cacoshl', 'cacosl', 'casinhl', 'catanhl'.  ('conjl', 'cprojl' are not
currently being used in libgfortran.)

This in turn simplifies GCC/nvptx 'libgfortran/intrinsics/c99_functions.c'
compilation such that this files doesn't have to provide its own
"Implementation of various C99 functions" for those, when in fact they're
available in newlib libm.
---

A few more words on why this is relevant for GCC.

For example, 'cexpl' usually is provided by libm, but if it isn't, the
open-coded replacement function in
'libgfortran/intrinsics/c99_functions.c' is effective if it holds that
'defined(HAVE_COSL) && defined(HAVE_SINL) && defined(HAVE_EXPL)':

long double complex
cexpl (long double complex z)
{
  long double a, b;
  long double complex v;

  a = REALPART (z);
  b = IMAGPART (z);
  COMPLEX_ASSIGN (v, cosl (b), sinl (b));
  return expl (a) * v;
}

This replacement code is active for current GCC/nvptx (... if no longer
compiling GCC/nvptx libgfortran in "minimal" mode, 'LIBGFOR_MINIMAL',
which I'm currently working on).

Comparing the preceeding to the 'c99_functions.c.188t.sincos' dump, we see for
that function:

 __attribute__((nothrow, leaf, const))
 complex long double cexpl (complex long double z)
 {
   long double b;
   long double a;
   long double _1;
   long double _2;
   long double _4;
   long double _5;
   long double _11;
+  complex long double sincostmp_13;

[local count: 1073741824]:
   a_7 = REALPART_EXPR ;
   b_8 = IMAGPART_EXPR ;
-  _1 = cosl (b_8);
-  _2 = sinl (b_8);
+  sincostmp_13 = __builtin_cexpil (b_8);
+  _1 = REALPART_EXPR ;
+  _2 = IMAGPART_EXPR ;
   _11 = expl (a_7);
   _4 = _1 * _11;
   _5 = _2 * _11;
   REALPART_EXPR <> = _4;
   IMAGPART_EXPR <> = _5;
   return ;

 }

That is, the 'cosl (b)', 'sinl (b)' sequence is replaced by
'__builtin_cexpil'.  That '__builtin_cexpil' is then later mapped back
into: 'cexpl'.  We've now got an infinitely-recursive 'cexpl' replacement
function, "implemented via itself"; GCC/nvptx libgfortran assumes there
is no 'cexpl' in libm, whereas this 'sincos' transformation does assume
that there is.  (..., which looks like an additional bug on its own.)

At the PTX-level, this leads to the following:

[...]
// BEGIN GLOBAL FUNCTION DECL: cexpl
.visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2);

// BEGIN GLOBAL FUNCTION DEF: cexpl
.visible .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2)
{
[...]
call cexpl, (%out_arg1, %out_arg2, %out_arg3);
[...]
ret;
}

[...]
// BEGIN GLOBAL FUNCTION DECL: cexpl
.extern .func cexpl (.param.u64 %in_ar0, .param.f64 %in_ar1, .param.f64 
%in_ar2);
[...]

We see the '.visible .func cexpl' declaration and definition for the
libgfortran replacement function and in the same compilation unit also
the '.extern .func cexpl' declaration that implicitly gets introduced via
the 'sincos' transformation (via the GCC/nvptx back end emitting an
explicit declaration of any function referenced), and 'ptxas' then
(rightfully so) complains about that mismatch:

ptxas c99_functions.o, line 35; error   : Inconsistent redefinition of 
variable 'cexpl'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
make[2]: *** [c99_functions.lo] Error 1

---
 newlib/libc/include/complex.h | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/newlib/libc/include/complex.h b/newlib/libc/include/complex.h
index 0a3ea97ed..ad3028e4c 100644
--- a/newlib/libc/include/complex.h
+++ b/newlib/libc/include/complex.h
@@ -20,6 +20,7 @@ __BEGIN_DECLS
 /* 7.3.5.1 The cacos functions */
 double complex cacos(double complex);
 float complex cacosf(float complex);
+long double complex cacosl(long double complex);

 /* 7.3.5.2 The casin functions */
 double complex casin(double complex);
@@ -34,44 +35,54 @@ long double complex catanl(long double complex);
 /* 

Re: Add 'libgomp.oacc-fortran/declare-allocatable-1.f90' (was: [gomp4] add support for fortran allocate support with declare create)

2022-11-03 Thread Thomas Schwinge
Hi!

Let me add back CC: , so that others may comment,
too.

On 2022-11-03T01:37:10+0100, Bernhard Reutner-Fischer  
wrote:
> On 2 November 2022 21:04:56 CET, Thomas Schwinge  
> wrote:
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>>> @@ -0,0 +1,211 @@
>>> +! Test declare create with allocatable arrays.
>>
>>... is useful in a different (though related) context that I'm currently
>>working on.  Having applied the following changes:
>>
>>  - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
>>changes).
>
> Please remind me why you don't stop N
> but error stop?

  - Don't have to re-number if changing test case later on.
  - Prints a backtrace (where supported).

> Re: https://gcc.gnu.org/legacy-ml/fortran/2018-09/msg00173.html
>
> You'd obviously tweak
> sub(/call\s\s*abort/, "stop " i)
> with error\s\s*stop
>
> Or is your output so br^W lacking that you cannot write but just return? But 
> then i think that error stop writes, too, so that cannot be the case, can it?

Right.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Support OpenACC 'declare create' with Fortran allocatable arrays, part II [PR106643, PR96668] (was: Support OpenACC 'declare create' with Fortran allocatable arrays, part I [PR106643])

2022-11-02 Thread Thomas Schwinge
here.  Side note: in the first version
of my changes, I had actually here in
'libgomp/oacc-mem.c:goacc_enter_data_internal' re-implemented the
corresponding -- "somewhat ugly" -- logic, when at some point I realized
that I instead could simply call into the existing code, greatly reducing
the complexity here...  Pushed to master branch
commit f6ce1e77bbf5d3a096f52e674bfd7354c6537d10
"Support OpenACC 'declare create' with Fortran allocatable arrays, part II 
[PR106643, PR96668]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f6ce1e77bbf5d3a096f52e674bfd7354c6537d10 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Oct 2022 15:06:45 +0200
Subject: [PATCH] Support OpenACC 'declare create' with Fortran allocatable
 arrays, part II [PR106643, PR96668]

	PR libgomp/106643
	PR fortran/96668
	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Support
	OpenACC 'declare create' with Fortran allocatable arrays, part II.
	* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90:
	Adjust.
	* testsuite/libgomp.oacc-fortran/pr106643-1.f90: New.
---
 libgomp/oacc-mem.c| 15 +++-
 ...locatable-array_descriptor-1-directive.f90 | 90 +--
 .../libgomp.oacc-fortran/pr106643-1.f90   | 83 +
 3 files changed, 160 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr106643-1.f90

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index ba010fddbb3..233fe0e4c1d 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1166,7 +1166,10 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 
 	  struct target_mem_desc *tgt = n->tgt;
 
-	  /* Arrange so that OpenACC 'declare' code à la PR106643
+	  /* Minimal OpenACC variant corresponding to PR96668
+	 "[OpenMP] Re-mapping allocated but previously unallocated
+	 allocatable does not work" 'libgomp/target.c' changes, so that
+	 OpenACC 'declare' code à la PR106643
 	 "[gfortran + OpenACC] Allocate in module causes refcount error"
 	 has a chance to work.  */
 	  if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET
@@ -1181,6 +1184,16 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		  assert ((kinds[i + k] & 0xff) == GOMP_MAP_POINTER);
 		}
 
+	  /* Let 'goacc_map_vars' -> 'gomp_map_vars_internal' handle
+		 this.  */
+	  gomp_mutex_unlock (_dev->lock);
+	  struct target_mem_desc *tgt_
+		= goacc_map_vars (acc_dev, aq, groupnum, [i], NULL,
+  [i], [i], true,
+  GOMP_MAP_VARS_ENTER_DATA);
+	  assert (tgt_ == NULL);
+	  gomp_mutex_lock (_dev->lock);
+
 	  /* Given that 'goacc_exit_data_internal'/'goacc_exit_datum_1'
 		 will always see 'n->refcount == REFCOUNT_INFINITY',
 		 there's no need to adjust 'n->dynamic_refcount' here.  */
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90
index 10e1d5bc378..6604f72c5c1 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90
@@ -105,27 +105,50 @@ program test
   !$acc enter data create (b)
   ! This is now OpenACC "present":
   if (.not.acc_is_present (b)) error stop
-  ! This still has the initial array descriptor:
+  ! ..., and got the actual array descriptor installed:
   !$acc serial
-  call verify_initial
+  call verify_n1_allocated
   !$acc end serial
 
   do i = n1_lb, n1_ub
  b(i) = i - 1
   end do
 
-  ! Verify that host-to-device copy doesn't touch the device-side (still
-  ! initial) array descriptor (but it does copy the array data).
+  ! In 'declare-allocatable-array_descriptor-1-runtime.f90', this does "verify
+  ! that host-to-device copy doesn't touch the device-side (still initial)
+  ! array descriptor (but it does copy the array data").  This is here not
+  ! applicable anymore, as we've already gotten the actual array descriptor
+  ! installed.  Thus now verify that it does copy the array data.
   call acc_update_device (b)
   !$acc serial
-  call verify_initial
+  call verify_n1_allocated
   !$acc end serial
 
   b = 40
 
-  ! Verify that device-to-host copy doesn't touch the host-side array
-  ! descriptor, doesn't copy out the device-side (still initial) array
-  ! descriptor (but it does copy the array data).
+  !$acc parallel copyout (id1_1) ! No data clause for 'b' (explicit or implicit): no 'GOMP_MAP_TO_PSET'.
+  call verify_n1_values (-1)
+  id1_1 = 0

Support OpenACC 'declare create' with Fortran allocatable arrays, part I [PR106643]

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:15:31+0100, I wrote:
> On 2022-11-02T21:10:54+0100, I wrote:
>> On 2022-11-02T21:04:56+0100, I wrote:
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>>> @@ -0,0 +1,268 @@
>>> +! Test OpenACC 'declare create' with allocatable arrays.
>>> +
>>> +! { dg-do run }
>>> +
>>> +!TODO-OpenACC-declare-allocate
>>> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
>>> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
>>> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
>>> behavior".
>>> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
>>> +
>>> +[...]
>>
>> Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
>> work around (as seen in real-world code), I've pushed to master branch
>> commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
>> "Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'"
>
>> ... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
>> for missing support for OpenACC "Changes from Version 2.0 to 2.5":
>> "The 'declare create' directive with a Fortran 'allocatable' has new 
>> behavior".
>> Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
>> manually.
>
> A similar test case, but with different focus, I've pushed to master
> branch in commit abeaf3735fe2568b9d5b8096318da866b1fe1e5c
> "Add 
> 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'",
> see attached.

> --- /dev/null
> +++ 
> b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
> @@ -0,0 +1,402 @@
> +! Test OpenACC 'declare create' with allocatable arrays.
> +
> +! { dg-do run }
> +
> +! Note that we're not testing OpenACC semantics here, but rather documenting
> +! current GCC behavior, specifically, behavior concerning updating of
> +! host/device array descriptors.
> +! { dg-skip-if n/a { *-*-* } { -DACC_MEM_SHARED=1 } }
> +
> +!TODO-OpenACC-declare-allocate
> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> +! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
> +! manually.

If instead of calling 'acc_create'/'acc_delete' we'd like to use
'!$acc enter data create'/'!$acc exit data delete', we run into
<https://gcc.gnu.org/PR106643>
"[gfortran + OpenACC] Allocate in module causes refcount error".
Pushed to master branchcommit da8e0e1191c5512244a752b30dea0eba83e3d10c
"Support OpenACC 'declare create' with Fortran allocatable arrays, part I 
[PR106643]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From da8e0e1191c5512244a752b30dea0eba83e3d10c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 27 Oct 2022 21:52:07 +0200
Subject: [PATCH] Support OpenACC 'declare create' with Fortran allocatable
 arrays, part I [PR106643]

	PR libgomp/106643
	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Support
	OpenACC 'declare create' with Fortran allocatable arrays, part I.
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1-directive.f90:
	New.
	* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90:
	New.
---
 libgomp/oacc-mem.c| 28 +--
 ...90 => declare-allocatable-1-directive.f90} | 14 --
 ...ocatable-array_descriptor-1-directive.f90} | 12 
 3 files changed, 44 insertions(+), 10 deletions(-)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-1.f90 => declare-allocatable-1-directive.f90} (95%)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-array_descriptor-1-runtime.f90 => declare-allocatable-array_descriptor-1-directive.f90} (98%)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 73b2710c2b8..ba010fddbb3 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1150,8 +1150,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	}
   else if (n && groupnum > 1)
 	{
-	  assert (n->refcount != REFCOUNT_INFINITY
-		  && n->refcount != REFCOUNT_LINK);
+	  assert (n->refcount != REFCOUNT_LINK);
 
 	  for (size_t j = i + 1; j <= group_last; j++)
 	if ((kinds[j] & 0xff) == GOMP_MAP_ATTACH)

Add 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:10:54+0100, I wrote:
> On 2022-11-02T21:04:56+0100, I wrote:
>> On 2017-04-05T08:23:58-0700, Cesar Philippidis  
>> wrote:
>>> This patch implements the OpenACC 2.5 behavior of fortran allocate on
>>> variables marked with declare create as defined in Section 2.13.2 in the
>>> OpenACC spec.
>>
>> That functionality is still missing in GCC master branch, however a test
>> case included in that submission here:
>>
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>>> @@ -0,0 +1,211 @@
>>> +! Test declare create with allocatable arrays.
>>
>> ... is useful in a different (though related) context that I'm currently
>> working on.  Having applied the following changes:
>>
>>   - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
>> changes).
>>   - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
>>   - Add scanning for OpenACC compiler diagnostics.
>>   - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).
>>
>> ..., I've then pushed to master branch
>> commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
>> "Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>> @@ -0,0 +1,268 @@
>> +! Test OpenACC 'declare create' with allocatable arrays.
>> +
>> +! { dg-do run }
>> +
>> +!TODO-OpenACC-declare-allocate
>> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
>> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
>> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
>> behavior".
>> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
>> +
>> +[...]
>
> Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
> work around (as seen in real-world code), I've pushed to master branch
> commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
> "Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'"

> ... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
> for missing support for OpenACC "Changes from Version 2.0 to 2.5":
> "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
> manually.

A similar test case, but with different focus, I've pushed to master
branch in commit abeaf3735fe2568b9d5b8096318da866b1fe1e5c
"Add 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From abeaf3735fe2568b9d5b8096318da866b1fe1e5c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 26 Oct 2022 23:47:29 +0200
Subject: [PATCH] Add
 'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90:
	New.
---
 ...allocatable-array_descriptor-1-runtime.f90 | 402 ++
 1 file changed, 402 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
new file mode 100644
index 000..b27f312631d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90
@@ -0,0 +1,402 @@
+! Test OpenACC 'declare create' with allocatable arrays.
+
+! { dg-do run }
+
+! Note that we're not testing OpenACC semantics here, but rather documenting
+! current GCC behavior, specifically, behavior concerning updating of
+! host/device array descriptors.
+! { dg-skip-if n/a { *-*-* } { -DACC_MEM_SHARED=1 } }
+
+!TODO-OpenACC-declare-allocate
+! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
+! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
+! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
+! manually.
+
+
+!TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
+
+
+!TODO OpenACC 'serial' vs. GCC/nvptx:
+!TODO { dg-prune-output {using 'vector_length \(32\)', ignoring 1} }
+
+
+! { dg-additional-options -

Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90' (was: Add 'libgomp.oacc-fortran/declare-allocatable-1.f90')

2022-11-02 Thread Thomas Schwinge
Hi!

On 2022-11-02T21:04:56+0100, I wrote:
> On 2017-04-05T08:23:58-0700, Cesar Philippidis  wrote:
>> This patch implements the OpenACC 2.5 behavior of fortran allocate on
>> variables marked with declare create as defined in Section 2.13.2 in the
>> OpenACC spec.
>
> That functionality is still missing in GCC master branch, however a test
> case included in that submission here:
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
>> @@ -0,0 +1,211 @@
>> +! Test declare create with allocatable arrays.
>
> ... is useful in a different (though related) context that I'm currently
> working on.  Having applied the following changes:
>
>   - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
> changes).
>   - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
>   - Add scanning for OpenACC compiler diagnostics.
>   - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).
>
> ..., I've then pushed to master branch
> commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
> "Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
> @@ -0,0 +1,268 @@
> +! Test OpenACC 'declare create' with allocatable arrays.
> +
> +! { dg-do run }
> +
> +!TODO-OpenACC-declare-allocate
> +! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
> +! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
> +! "The 'declare create' directive with a Fortran 'allocatable' has new 
> behavior".
> +! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
> +
> +[...]

Getting rid of the "'dg-xfail-run-if' for '-DACC_MEM_SHARED=0'" via a
work around (as seen in real-world code), I've pushed to master branch
commit 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e
"Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 59c6c5dbf267cd9d0a8df72b2a5eb5657b64268e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 14 Oct 2022 17:36:51 +0200
Subject: [PATCH] Add 'libgomp.oacc-fortran/declare-allocatable-1-runtime.f90'

... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
for missing support for OpenACC "Changes from Version 2.0 to 2.5":
"The 'declare create' directive with a Fortran 'allocatable' has new behavior".
Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
manually.

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90:
	New.
---
 ...ble-1.f90 => declare-allocatable-1-runtime.f90} | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)
 copy libgomp/testsuite/libgomp.oacc-fortran/{declare-allocatable-1.f90 => declare-allocatable-1-runtime.f90} (96%)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
similarity index 96%
copy from libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
copy to libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
index 1c8ccd9f61f..e4cb9c378a3 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90
@@ -3,10 +3,10 @@
 ! { dg-do run }
 
 !TODO-OpenACC-declare-allocate
-! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
 ! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
 ! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
-! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
+! Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
+! manually.
 
 !TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
 
@@ -67,6 +67,7 @@ program test
   ! Test local usage of an allocated declared array.
 
   allocate (b(n))
+  call acc_create (b)
 
   if (.not.allocated (b)) error stop
   if (.not.acc_is_present (b)) error stop
@@ -91,12 +92,14 @@ program test
  if (b(i) /= i*a) error stop
   end do
 
+  call acc_delete (b)
   deallocate (b)
 
   ! Test the usage of an allocated declared array inside an acc
   ! routine subroutine.
 
   allocate (b(n))
+  call acc_create (b)
 
   if (.not.allocated (b)) error stop
   if (.not.acc_is_present (b)) error stop
@@ -114,6 +117,7 @@ program test
  if (b(i) /= i*2) error stop
   end do
 
+  call acc_delete (b)
   deallo

Add 'libgomp.oacc-fortran/declare-allocatable-1.f90' (was: [gomp4] add support for fortran allocate support with declare create)

2022-11-02 Thread Thomas Schwinge
Hi!

On 2017-04-05T08:23:58-0700, Cesar Philippidis  wrote:
> This patch implements the OpenACC 2.5 behavior of fortran allocate on
> variables marked with declare create as defined in Section 2.13.2 in the
> OpenACC spec.

That functionality is still missing in GCC master branch, however a test
case included in that submission here:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
> @@ -0,0 +1,211 @@
> +! Test declare create with allocatable arrays.

... is useful in a different (though related) context that I'm currently
working on.  Having applied the following changes:

  - Replace 'call abort' by 'error stop' (in spirit of earlier PR84381
changes).
  - Replace '[logical] .neqv. .true.' by '.not.[logical]'.
  - Add scanning for OpenACC compiler diagnostics.
  - 'dg-xfail-run-if' for '-DACC_MEM_SHARED=0' (see above).

..., I've then pushed to master branch
commit 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe
"Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8c357d884b16cb3c14cba8a61be5b53fd04a6bfe Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 5 Apr 2017 08:23:58 -0700
Subject: [PATCH] Add 'libgomp.oacc-fortran/declare-allocatable-1.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New.

Co-authored-by: Thomas Schwinge 
---
 .../declare-allocatable-1.f90 | 268 ++
 1 file changed, 268 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
new file mode 100644
index 000..1c8ccd9f61f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90
@@ -0,0 +1,268 @@
+! Test OpenACC 'declare create' with allocatable arrays.
+
+! { dg-do run }
+
+!TODO-OpenACC-declare-allocate
+! Not currently implementing correct '-DACC_MEM_SHARED=0' behavior:
+! Missing support for OpenACC "Changes from Version 2.0 to 2.5":
+! "The 'declare create' directive with a Fortran 'allocatable' has new behavior".
+! { dg-xfail-run-if TODO { *-*-* } { -DACC_MEM_SHARED=0 } }
+
+!TODO { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.
+
+! { dg-additional-options -fopt-info-all-omp }
+! { dg-additional-options -foffload=-fopt-info-all-omp }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+! { dg-additional-options -foffload=--param=openacc-privatization=noisy }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable '[Di]\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
+! { dg-additional-options -Wopenacc-parallelism }
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c 0] }
+! { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".
+
+
+module vars
+  implicit none
+  integer, parameter :: n = 100
+  real*8, allocatable :: b(:)
+ !$acc declare create (b)
+end module vars
+
+program test
+  use vars
+  use openacc
+  implicit none
+  real*8 :: a
+  integer :: i
+
+  interface
+ subroutine sub1
+   !$acc routine gang
+ end subroutine sub1
+
+ subroutine sub2
+ end subroutine sub2
+
+ real*8 function fun1 (ix)
+   integer ix
+   !$acc routine seq
+ end function fun1
+
+ real*8 function fun2 (ix)
+   integer ix
+   !$acc routine seq
+ end function fun2
+  end interface
+
+  if (allocated (b)) error stop
+
+  ! Test local usage of an allocated declared array.
+
+  allocate (b(n))
+
+  if (.not.allocated (b)) error stop
+  if (.not.acc_is_present (b)) error stop
+
+  a = 2.0
+
+  !$acc parallel loop ! { dg-line l[incr c] }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l$c }
+  !   { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } l$c }
+  !   { dg-note {variable 'i' adjusted for OpenACC privatization level: 'vector'} {} { target { ! openacc_host_selected } } l$c }
+  ! { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatiza

OpenACC: Don't gang-privatize artificial variables [PR90115] (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-18T16:46:07+0200, Thomas Schwinge  wrote:
> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>> This patch prevents compiler-generated artificial variables from being
>> treated as privatization candidates for OpenACC.
>>
>> The rationale is that e.g. "gang-private" variables actually must be
>> shared by each worker and vector spawned within a particular gang, but
>> that sharing is not necessary for any compiler-generated variable (at
>> least at present, but no such need is anticipated either).  Variables on
>> the stack (and machine registers) are already private per-"thread"
>> (gang, worker and/or vector), and that's fine for artificial variables.
>
> OK, that seems fine rationale for this change in behavior.
> No contradicting test case jumped onto me, either.

>> Several tests need their scan output patterns adjusted to compensate.
>
> ACK -- surprisingly few.  (Some minor fine-tuning necessary for GCC
> master branch, as had to be expected; I'm working on that.)

With those changes...

>> --- a/gcc/omp-low.cc
>> +++ b/gcc/omp-low.cc
>> @@ -11400,6 +11400,28 @@ oacc_privatization_candidate_p (const location_t 
>> loc, const tree c,
>>  }
>>  }
>>
>> +  /* If an artificial variable has been added to a bind, e.g.
>> + a compiler-generated temporary structure used by the Fortran 
>> front-end, do
>> + not consider it as a privatization candidate.  Note that variables on
>> + the stack are private per-thread by default: making them "gang-private"
>> + for OpenACC actually means to share a single instance of a variable
>> + amongst all workers and threads spawned within each gang.
>> + At present, no compiler-generated artificial variables require such
>> + sharing semantics, so this is safe.  */
>> +
>> +  if (res && DECL_ARTIFICIAL (decl))
>> +{
>> +  res = false;
>> +
>> +  if (dump_enabled_p ())
>> +{
>> +  oacc_privatization_begin_diagnose_var (l_dump_flags, loc, c, decl);
>> +  dump_printf (l_dump_flags,
>> +   "isn%'t candidate for adjusting OpenACC privatization "
>> +   "level: %s\n", "artificial");
>> +}
>> +}
>
> In the source code comment, you say "added to a bind", and that's indeed
> what I was expecting, too, and thus put in:
>
>if (res && DECL_ARTIFICIAL (decl))
>  {
> +  gcc_checking_assert (block);
> +
>res = false;
>
> ..., but to my surprised, that did fire in one occasion:
>
>> --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> @@ -94,9 +94,7 @@ contains
>>  !$acc parallel copy(array)
>>  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>>  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>> adjusting OpenACC privatization level: not addressable} "" { target *-*-* } 
>> l_loop$c_loop }
>> -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate 
>> for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop 
>> }
>> -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for OpenACC 
>> privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>> -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>> privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>> openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>> +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>> candidate for adjusting OpenACC privatization level: artificial} "" { target 
>> *-*-* } l_loop$c_loop }
>>  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>> PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>>  do i = 1, 10
>>array(i) = 9*i
>
> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
> everywhere else we have "declared in block".
>
> As part of your verification, have you already looked into whether the
> new behavior is correct here, or does this one need to continue to be
> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead of
> 'if (res && DECL_ARTIFICIAL (decl))'

..., and that change merged in, I've th

Re: [og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-28T10:11:04+0200, I wrote:
> On 2022-10-18T15:59:24+0100, Julian Brown  wrote:
>> On Tue, 18 Oct 2022 16:46:07 +0200 Thomas Schwinge  
>> wrote:
>>> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>>> ..., but to my surprised, that did fire in one occasion:
>>>
>>> > --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>>> > +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>>> > @@ -94,9 +94,7 @@ contains
>>> >  !$acc parallel copy(array)
>>> >  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>>> >  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>>> > adjusting OpenACC privatization level: not addressable} "" { target *-*-* 
>>> > } l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is 
>>> > candidate for adjusting OpenACC privatization level} "" { target *-*-* } 
>>> > l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for 
>>> > OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>>> > privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>>> > openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>>> > +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>>> > candidate for adjusting OpenACC privatization level: artificial} "" { 
>>> > target *-*-* } l_loop$c_loop }
>>> >  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>>> > PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>>> >  do i = 1, 10
>>> >array(i) = 9*i
>>>
>>> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
>>> everywhere else we have "declared in block".
>>>
>>> As part of your verification, have you already looked into whether the
>>> new behavior is correct here, or does this one need to continue to be
>>> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
>>> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead
>>> of 'if (res && DECL_ARTIFICIAL (decl))', or is there some wrong
>>> setting of 'DECL_ARTIFICIAL' -- or are we maybe looking at an
>>> inappropriate 'decl'? (Thinking of commit
>>> r12-7580-g7a5e036b61aa088e6b8564bc9383d37dfbb4801e "[OpenACC
>>> privatization] Analyze 'lookup_decl'-translated DECL [PR90115,
>>> PR102330, PR104774]", for example.)
>>
>> I haven't looked in detail, but it seems to me that the "artificial"
>> flag isn't appropriate for that decl, which is (derived from?) a
>> user-visible symbol. So, I'm not sure what's going on there (and yes
>> the commit you mention looks like it could be relevant, I think?).
>> There are probably subtleties I'm not aware of...
>
> Until we've got that worked out, let's simply restrict the
> 'DECL_ARTIFICIAL' handling to 'block's only; pushed to devel/omp/gcc-12
> commit 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0
> "[og12] OpenACC: Don't gang-privatize artificial variables: restrict to 
> blocks"

..., see attached now really.

Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Oct 2022 16:59:54 +0200
Subject: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables:
 restrict to blocks

Follow-up to og12 commit d4504346d2a1d6ffecb8b2d8e3e04ab8ea259785
"[og12] OpenACC: Don't gang-privatize artificial variables", to restore
the previous behavior, until we understand what it means for a
'DECL_ARTIFICIAL' to appear in a 'private' clause.

	gcc/
	* omp-low.cc (oacc_privatization_candidate_p) :
	Restrict to 'block's.
	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.
---
 gcc/omp-low.cc  | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 002f91d930a..66aa11cd32d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-l

[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-18T15:59:24+0100, Julian Brown  wrote:
> On Tue, 18 Oct 2022 16:46:07 +0200 Thomas Schwinge  
> wrote:
>> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>> ..., but to my surprised, that did fire in one occasion:
>>
>> > --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> > +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> > @@ -94,9 +94,7 @@ contains
>> >  !$acc parallel copy(array)
>> >  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>> >  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>> > adjusting OpenACC privatization level: not addressable} "" { target *-*-* 
>> > } l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is 
>> > candidate for adjusting OpenACC privatization level} "" { target *-*-* } 
>> > l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for 
>> > OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>> > privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>> > openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>> > +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>> > candidate for adjusting OpenACC privatization level: artificial} "" { 
>> > target *-*-* } l_loop$c_loop }
>> >  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>> > PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>> >  do i = 1, 10
>> >array(i) = 9*i
>>
>> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
>> everywhere else we have "declared in block".
>>
>> As part of your verification, have you already looked into whether the
>> new behavior is correct here, or does this one need to continue to be
>> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
>> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead
>> of 'if (res && DECL_ARTIFICIAL (decl))', or is there some wrong
>> setting of 'DECL_ARTIFICIAL' -- or are we maybe looking at an
>> inappropriate 'decl'? (Thinking of commit
>> r12-7580-g7a5e036b61aa088e6b8564bc9383d37dfbb4801e "[OpenACC
>> privatization] Analyze 'lookup_decl'-translated DECL [PR90115,
>> PR102330, PR104774]", for example.)
>
> I haven't looked in detail, but it seems to me that the "artificial"
> flag isn't appropriate for that decl, which is (derived from?) a
> user-visible symbol. So, I'm not sure what's going on there (and yes
> the commit you mention looks like it could be relevant, I think?).
> There are probably subtleties I'm not aware of...

Until we've got that worked out, let's simply restrict the
'DECL_ARTIFICIAL' handling to 'block's only; pushed to devel/omp/gcc-12
commit 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0
"[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks",
see attached.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


OpenACC 'acc_is_present' on un-allocated array: '-Wuninitialized' diagnostics

2022-10-24 Thread Thomas Schwinge
Hi!

Given the following reduced code, from a bigger test case that I'm
currently writing:

program main
  use openacc
  implicit none

  integer, allocatable :: ar(:,:,:)
  logical :: l

  if (allocated (ar)) stop 10 ! just for illustration
  l = acc_is_present (ar)
  print *, l

end program main

..., this results in a list of '-Wuninitialized' diagnostics (have not
checked if also bad/"unexpected" code gnerated), and from the
'*.original' dump it's clear where that's coming from:

__attribute__((fn spec (". ")))
void MAIN__ ()
{
  struct array03_integer(kind=4) ar;
  logical(kind=4) l;

  ar.data = 0B;
  ar.dtype = {.elem_len=4, .rank=3, .type=1};
  if ((integer(kind=4)[0:] * restrict) ar.data != 0B)
{
  _gfortran_stop_numeric (10, 0);
}
  L.1:;
  {
integer(kind=8) D.4260;
integer(kind=8) D.4261;
integer(kind=8) D.4262;
integer(kind=8) D.4263;
integer(kind=8) D.4264;
integer(kind=8) D.4265;
struct array03_integer(kind=4) parm.0;
integer(kind=8) D.4272;
integer(kind=8) D.4273;

D.4260 = ar.dim[0].lbound;
D.4261 = ar.dim[0].ubound;
D.4262 = ar.dim[1].lbound;
D.4263 = ar.dim[1].ubound;
D.4264 = ar.dim[2].lbound;
D.4265 = ar.dim[2].ubound;
parm.0.span = 4;
parm.0.dtype = {.elem_len=4, .rank=3, .type=1};
parm.0.dim[0].lbound = 1;
parm.0.dim[0].ubound = (1 - D.4260) + D.4261;
parm.0.dim[0].stride = 1;
D.4272 = ar.dim[1].stride;
parm.0.dim[1].lbound = 1;
parm.0.dim[1].ubound = (1 - D.4262) + D.4263;
parm.0.dim[1].stride = NON_LVALUE_EXPR ;
D.4273 = ar.dim[2].stride;
parm.0.dim[2].lbound = 1;
parm.0.dim[2].ubound = (1 - D.4264) + D.4265;
parm.0.dim[2].stride = NON_LVALUE_EXPR ;
parm.0.data = (void *) &(*(integer(kind=4)[0:] * restrict) 
ar.data)[((D.4260 - ar.dim[0].lbound) + (D.4262 - ar.dim[1].lbound) * D.4272) + 
(D.4264 - ar.dim[2].lbound) * D.4273];
parm.0.offset = ~NON_LVALUE_EXPR  - NON_LVALUE_EXPR ;
l = acc_is_present_array_h ();
  }
[...]

Note 'D.4260 = ar.dim[0].lbound;', etc., with these 'ar' fields not
having been initialized.

For reference, OpenACC 'acc_is_present' is implemented in
'libgomp/openacc.f90':

   [...]
 72 module openacc_internal
 73   use openacc_kinds
 74   implicit none
 75
 76   interface
   [...]
360 function acc_is_present_array_h (a)
361   logical acc_is_present_array_h
362   type (*), dimension (..), contiguous :: a
363 end function
   [...]
508   end interface
509
510   interface
   [...]
698 function acc_is_present_l (a, len) &
699 bind (C, name = "acc_is_present")
700   use iso_c_binding, only: c_int32_t, c_size_t
701   !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
702   integer (c_int32_t) :: acc_is_present_l
703   type (*), dimension (*) :: a
704   integer (c_size_t), value :: len
705 end function
   [...]
760   end interface
761 end module openacc_internal
762
763 module openacc
764   use openacc_kinds
765   use openacc_internal
766   implicit none
767
768   private
   [...]
793   public :: [...], acc_is_present
   [...]
961   interface acc_is_present
962 procedure :: acc_is_present_32_h
963 procedure :: acc_is_present_64_h
964 procedure :: acc_is_present_array_h
965   end interface
   [...]
   1006 end module openacc
   [...]
   1413 function acc_is_present_array_h (a)
   1414   use openacc_internal, only: acc_is_present_l
   1415   logical acc_is_present_array_h
   1416   type (*), dimension (..), contiguous :: a
   1417   acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) /= 0
   1418 end function
   [...]

GCC currently implements OpenACC 2.6,
,
which in 3.2.30. "acc_is_present" states:

*Summary* The 'acc_is_present' routine tests whether a host variable or 
array region is present
on the device.

*Format*
C or C++:
int acc_is_present( h_void*, size_t );
Fortran:
logical function acc_is_present( a )
logical function acc_is_present( a, len )
 type(*), dimension(..) :: a
 integer :: len

*Description* The 'acc_is_present' routine tests whether the specified host 
data is present
on the device. In C, the arguments are a pointer to the data and length in 
bytes; the function
returns nonzero if the specified data is fully present, and zero otherwise. 
In Fortran, two forms are
supported. In the first, the argument is a contiguous array section of 
intrinsic type. In the second,
the first argument is a variable or array element and the second is the 
length in bytes. 

Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421] (was: amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421])

2022-10-20 Thread Thomas Schwinge
Hi!

On 2022-10-20T12:05:28+0200, I wrote:
> On 2022-10-14T13:38:55+, Julian Brown  wrote:
>> The GCN backend uses a heuristic to determine whether to use FLAT or
>> GLOBAL addressing in a particular (offload) function: namely, if a
>> function takes a pointer-to-scalar parameter, it is assumed that the
>> pointer may refer to "flat scratch" space, and thus FLAT addressing must
>> be used instead of GLOBAL.
>>
>> I came up with this heuristic initially whilst working on support for
>> moving OpenACC gang-private variables into local-data share (scratch)
>> memory. The assumption that only scalar variables would be transformed in
>> that way turned out to be wrong.  For example, [...]
>> Fortran compiler-generated temporary structures were treated
>> as gang private and moved to LDS space, typically overflowing the region
>> allocated for such variables.  [...]
>> there may be other cases of structs moving to LDS
>> space now or in the future that this patch may be needed for.

When I (back then) had looked into PR105421
"GCN offloading, raised '-mgang-private-size': 
'HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION'",
I had been experimenting with different test codes, that all didn't
exhibit this problem.  Now I understand that 'struct' (as implied by
PR105421's Fortran 'write', for example) was the crucial thing there
(that is, 'AGGREGATE_TYPE_P (TREE_TYPE (TREE_VALUE (arg)))' in context of
the previous code).  With...

> pushed to master branch commit 7c55755d4c760de326809636531478fd7419e1e5
> "amdgcn: Use FLAT addressing for all functions with pointer arguments 
> [PR105421]"

... that addressed, I've now pushed to master branch
commit c7ebee2378426eeca425ca5406af213a926f154c
"Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c7ebee2378426eeca425ca5406af213a926f154c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Oct 2022 00:13:47 +0200
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]

After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.

	PR target/105421
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.
---
 .../libgomp.oacc-c-c++-common/private-big-1.c | 100 ++
 1 file changed, 100 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/private-big-1.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-big-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-big-1.c
new file mode 100644
index 000..c0e8db0c894
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-big-1.c
@@ -0,0 +1,100 @@
+/* Test "big" private data.  */
+
+/* { dg-additional-options -fno-inline } for stable results regarding OpenACC 'routine'.  */
+
+/* { dg-additional-options -fopt-info-all-omp }
+   { dg-additional-options --param=openacc-privatization=noisy }
+   { dg-additional-options -foffload=-fopt-info-all-omp }
+   { dg-additional-options -foffload=--param=openacc-privatization=noisy }
+   for testing/documenting aspects of that functionality.  */
+
+/* { dg-additional-options -Wopenacc-parallelism } for testing/documenting
+   aspects of that functionality.  */
+
+/* For GCN offloading compilation, we (expectedly) run into a
+   'gang-private data-share memory exhausted' error: the default
+   '-mgang-private-size' is too small.  Raise it so that 'uint32_t x[344]' plus
+   some internal-use data fits in:
+   { dg-additional-options -foffload-options=amdgcn-amdhsa=-mgang-private-size=1555 { target openacc_radeon_accel_selected } } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop 0] }
+   { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
+
+#include 
+#include 
+
+
+/* Based on 'private-variables.c:loop_g_5'.  */
+
+/* To demonstrate PR105421 "GCN offloading, raised '-mgang-private-size':
+   'HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION'", a 'struct' indirection, for
+   example, has been necessary in combination with a separate routine.  */
+
+struct data
+{
+  uint32_t *x;
+  uint32_t *arr;
+  u

amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421] (was: [PATCH] [og12] amdgcn: Use FLAT addressing for all functions with pointer arguments)

2022-10-20 Thread Thomas Schwinge
Hi!

On 2022-10-14T13:38:55+, Julian Brown  wrote:
> The GCN backend uses a heuristic to determine whether to use FLAT or
> GLOBAL addressing in a particular (offload) function: namely, if a
> function takes a pointer-to-scalar parameter, it is assumed that the
> pointer may refer to "flat scratch" space, and thus FLAT addressing must
> be used instead of GLOBAL.
>
> I came up with this heuristic initially whilst working on support for
> moving OpenACC gang-private variables into local-data share (scratch)
> memory. The assumption that only scalar variables would be transformed in
> that way turned out to be wrong.  For example, prior to the next patch in
> the series, Fortran compiler-generated temporary structures were treated
> as gang private and moved to LDS space, typically overflowing the region
> allocated for such variables.  That will no longer happen after that
> patch is applied, but there may be other cases of structs moving to LDS
> space now or in the future that this patch may be needed for.
>
> Tested with offloading to AMD GCN. I will apply shortly (to og12).

Thanks.  I've verified that this does resolve PR105421
"GCN offloading, raised '-mgang-private-size': 
'HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION'"
and have thus added PR105421 tags to your commit log, and with that
pushed to master branch commit 7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments 
[PR105421]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7c55755d4c760de326809636531478fd7419e1e5 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Fri, 14 Oct 2022 11:06:07 +
Subject: [PATCH] amdgcn: Use FLAT addressing for all functions with pointer
 arguments [PR105421]

The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.

I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong.  For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables.  That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.

2022-10-14  Julian Brown  

	PR target/105421
gcc/
	* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
	argument forces FLAT addressing mode, not just
	pointer-to-non-aggregate.
---
 gcc/config/gcn/gcn.cc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 8777255a5c6..a9ef5c3dc02 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2809,10 +2809,14 @@ gcn_arg_partial_bytes (cumulative_args_t cum_v, const function_arg_info )
   return (NUM_PARM_REGS - cum_num) * regsize;
 }
 
-/* A normal function which takes a pointer argument (to a scalar) may be
-   passed a pointer to LDS space (via a high-bits-set aperture), and that only
-   works with FLAT addressing, not GLOBAL.  Force FLAT addressing if the
-   function has an incoming pointer-to-scalar parameter.  */
+/* A normal function which takes a pointer argument may be passed a pointer to
+   LDS space (via a high-bits-set aperture), and that only works with FLAT
+   addressing, not GLOBAL.  Force FLAT addressing if the function has an
+   incoming pointer parameter.  NOTE: This is a heuristic that works in the
+   offloading case, but in general, a function might read global pointer
+   variables, etc. that may refer to LDS space or other special memory areas
+   not supported by GLOBAL instructions, and then this argument check would not
+   suffice.  */
 
 static void
 gcn_detect_incoming_pointer_arg (tree fndecl)
@@ -2822,8 +2826,7 @@ gcn_detect_incoming_pointer_arg (tree fndecl)
   for (tree arg = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
arg;
arg = TREE_CHAIN (arg))
-if (POINTER_TYPE_P (TREE_VALUE (arg))
-	&& !AGGREGATE_TYPE_P (TREE_TYPE (TREE_VALUE (arg
+if (POINTER_TYPE_P (TREE_VALUE (arg)))
   cfun->machine->use_flat_addressing = true;
 }
 
-- 
2.35.1



Re: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables

2022-10-18 Thread Thomas Schwinge
Hi Julian!

On 2022-10-14T13:38:56+, Julian Brown  wrote:
> This patch prevents compiler-generated artificial variables from being
> treated as privatization candidates for OpenACC.
>
> The rationale is that e.g. "gang-private" variables actually must be
> shared by each worker and vector spawned within a particular gang, but
> that sharing is not necessary for any compiler-generated variable (at
> least at present, but no such need is anticipated either).  Variables on
> the stack (and machine registers) are already private per-"thread"
> (gang, worker and/or vector), and that's fine for artificial variables.

OK, that seems fine rationale for this change in behavior.
No contradicting test case jumped onto me, either.

> Several tests need their scan output patterns adjusted to compensate.

ACK -- surprisingly few.  (Some minor fine-tuning necessary for GCC
master branch, as had to be expected; I'm working on that.)

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -11400,6 +11400,28 @@ oacc_privatization_candidate_p (const location_t 
> loc, const tree c,
>   }
>  }
>
> +  /* If an artificial variable has been added to a bind, e.g.
> + a compiler-generated temporary structure used by the Fortran front-end, 
> do
> + not consider it as a privatization candidate.  Note that variables on
> + the stack are private per-thread by default: making them "gang-private"
> + for OpenACC actually means to share a single instance of a variable
> + amongst all workers and threads spawned within each gang.
> + At present, no compiler-generated artificial variables require such
> + sharing semantics, so this is safe.  */
> +
> +  if (res && DECL_ARTIFICIAL (decl))
> +{
> +  res = false;
> +
> +  if (dump_enabled_p ())
> + {
> +   oacc_privatization_begin_diagnose_var (l_dump_flags, loc, c, decl);
> +   dump_printf (l_dump_flags,
> +"isn%'t candidate for adjusting OpenACC privatization "
> +"level: %s\n", "artificial");
> + }
> +}

In the source code comment, you say "added to a bind", and that's indeed
what I was expecting, too, and thus put in:

   if (res && DECL_ARTIFICIAL (decl))
 {
+  gcc_checking_assert (block);
+
   res = false;

..., but to my surprised, that did fire in one occasion:

> --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
> @@ -94,9 +94,7 @@ contains
>  !$acc parallel copy(array)
>  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
> adjusting OpenACC privatization level: not addressable} "" { target *-*-* } 
> l_loop$c_loop }
> -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate 
> for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
> -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for OpenACC 
> privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
> -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC privatization 
> level: 'gang'} "" { target { ! { openacc_host_selected || { 
> openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
> +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
> candidate for adjusting OpenACC privatization level: artificial} "" { target 
> *-*-* } l_loop$c_loop }
>  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
> PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>  do i = 1, 10
>array(i) = 9*i

... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
everywhere else we have "declared in block".

As part of your verification, have you already looked into whether the
new behavior is correct here, or does this one need to continue to be
"adjusted for OpenACC privatization level: 'gang'"?  If the latter,
should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead of
'if (res && DECL_ARTIFICIAL (decl))', or is there some wrong setting of
'DECL_ARTIFICIAL' -- or are we maybe looking at an inappropriate 'decl'?
(Thinking of commit r12-7580-g7a5e036b61aa088e6b8564bc9383d37dfbb4801e
"[OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, 
PR102330, PR104774]",
for example.)


Grüße
 Thomas


> @@ -122,9 +120,7 @@ contains
>  ! { dg-note {variable 'str' in 'private' clause is candidate for 
> adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
>  ! { dg-note {variable 'str' ought to be adjusted for OpenACC 
> privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>  ! { dg-note {variable 'str' adjusted for OpenACC privatization level: 
> 'gang'} "" { target { ! { openacc_host_selected || { 
> openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
> -! { dg-note {variable 'char\.[0-9]+' 

Re: [Patch] OpenMP: Fix use_device_{addr, ptr} with in-data-sharing arg

2022-05-10 Thread Thomas Schwinge
 (:,:,:), (:,:,:)
> +integer, value :: dev
> +integer :: i
> +type(c_ptr) :: ptr
> +logical :: is_shared
> +
> +is_shared = .false.
> +!$omp target device(dev) map(to: is_shared)
> +  is_shared = .true.
> +!$omp end target
> +
> +allocate ((-4:10,-3:8,2))
> +(:,:,:) = reshape ([(-i, i = 1, size())], shape())
> +!$omp target enter data map(to: ) device(dev)
> +if (any (lbound () /= [-4, -3, 1])) error stop 1
> +if (any (shape () /= [15, 12, 2])) error stop 2
> +if (any (lbound () /= [-4, -3, 1])) error stop 3
> +if (any (shape () /= [15, 12, 2])) error stop 4
> +if (any ( /= -)) error stop 5
> +if (any ( /= reshape ([(i, i = 1, size())], shape( &
> +  error stop 6
> +
> +!$omp parallel do shared(, )
> +do i = 1,1
> +  if (any (lbound () /= [-4, -3, 1])) error stop 5
> +  if (any (shape () /= [15, 12, 2])) error stop 6
> +  if (any (lbound () /= [-4, -3, 1])) error stop 7
> +  if (any (shape () /= [15, 12, 2])) error stop 8
> +  if (any ( /= -)) error stop 5
> +  if (any ( /= reshape ([(i, i = 1, size())], shape( &
> +error stop 6
> +  ptr = c_loc ()
> +  !$omp target data use_device_ptr(, ) device(dev)
> +if (any (lbound () /= [-4, -3, 1])) error stop 9
> +if (any (shape () /= [15, 12, 2])) error stop 10
> +if (any (lbound () /= [-4, -3, 1])) error stop 11
> +if (any (shape () /= [15, 12, 2])) error stop 12
> +if (is_shared) then
> +  if (any ( /= -)) error stop 5
> +  if (any ( /= reshape ([(i, i = 1, size())], shape( 
> &
> +error stop 6
> +end if
> +if (is_shared .neqv. c_associated (ptr, c_loc ())) error stop
> +
> +! Uses has_device_addr due to PR fortran/105318
> +!!$omp target is_device_ptr(, ) device(dev)
> +!$omp target has_device_addr(, ) device(dev)
> +   if (any (lbound () /= [-4, -3, 1])) error stop 9
> +   if (any (shape () /= [15, 12, 2])) error stop 10
> +   if (any (lbound () /= [-4, -3, 1])) error stop 11
> +   if (any (shape () /= [15, 12, 2])) error stop 12
> +   if (any ( /= -)) error stop 5
> +   if (any ( /= reshape ([(i, i = 1, size())], 
> shape( &
> + error stop 6
> +    !$omp end target
> +  !$omp end target data
> +end do
> +!$omp target exit data map(delete: ) device(dev)
> +deallocate ()
> +  end subroutine test_ptr
> +end program main


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 798152475559a6be8049692932cc747c6499e7f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 10 May 2022 14:43:56 +0200
Subject: [PATCH] Fix up 'libgomp.fortran/use_device_addr-5.f90' multi-device
 testing

Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510
"OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we
currently get:

libgomp: use_device_ptr pointer wasn't mapped
FAIL: libgomp.fortran/use_device_addr-5.f90   -O  execution test

	libgomp/
	* testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up
	multi-device testing.
---
 libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90
index 1def70a1bc0..3124d60fe9b 100644
--- a/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90
@@ -8,7 +8,7 @@ program main
   aaa(:,:,:) = reshape ([(i, i = 1, size(aaa))], shape(aaa))
 
   do i = 0, omp_get_num_devices()
-!$omp target data map(to: aaa)
+!$omp target data map(to: aaa) device(i)
   call test_addr (aaa, i)
   call test_ptr (aaa, i)
 !$omp end target data
-- 
2.35.1



Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717])

2022-04-26 Thread Thomas Schwinge
, NULL, stmt, poplevel (1, 0));
>stmt = build2_loc (gfc_get_location (>loc), construct_code,
>void_type_node, stmt, oacc_clauses);
>gfc_add_expr_to_block (, stmt);


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3dfea06371aa9bcc84ad75a2bc821a45e131dca6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 26 Apr 2022 18:41:23 +0200
Subject: [PATCH] Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading
 compilation [PR104717]

That got broken by recent commit b2202431910e30d8505c94d1cb9341cac7080d10
"fortran: Fix up gfc_trans_oacc_construct [PR104717]".

	PR fortran/104717
	libgomp/
	* testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC
	privatization scanning.  For GCN offloading compilation, raise
	'-mgang-private-size'.
---
 .../libgomp.oacc-fortran/print-1.f90  | 30 ++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
index 7b7f73741fe..42a8538e1fb 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
@@ -6,11 +6,39 @@
 ! Separate file 'print-1-nvptx.f90' for nvptx offloading.
 ! { dg-skip-if "separate file" { offload_target_nvptx } }
 
+! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
+! (see below), we run into an 'gang-private data-share memory exhausted'
+! error: the default '-mgang-private-size' is too small.  Per
+! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
+! 'struct st_parameter_dt', which indeed is rather big.  Instead of
+! working out its exact size (which may vary per GCC configuration),
+! raise '-mgang-private-size' to an arbitrary high value.
+! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } }
+
+! { dg-additional-options "-fopt-info-note-omp" }
+! { dg-additional-options "-foffload=-fopt-info-note-omp" }
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_compute 0] }
+! { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".
+
 program main
   implicit none
   integer :: var = 42
 
-!$acc parallel
+!$acc parallel ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {variable 'dt_parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'dt_parm\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'dt_parm\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} {} { target { ! openacc_host_selected } } l_compute$c_compute }
   write (0, '("The answer is ", I2)') var
 !$acc end parallel
 
-- 
2.25.1



Re: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]

2022-04-25 Thread Thomas Schwinge
 constructs
for which that function is invoked need an extra artificial BIND_EXPR
around their body so that we move all variables of the bodies.

The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
or OMP_TARGET and for OpenACC constructs that behave similarly to
OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.

The following patch does that for OpenACC constructs too.

	PR fortran/104717
	gcc/fortran/
	* trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
	in an extra BIND_EXPR.
	gcc/testsuite/
	* gfortran.dg/goacc/pr104717.f90: New test.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.

Co-authored-by: Thomas Schwinge 
---
 gcc/fortran/trans-openmp.cc   |  2 ++
 gcc/testsuite/gfortran.dg/goacc/pr104717.f90  | 22 +++
 .../goacc/privatization-1-compute-loop.f90|  7 +++---
 .../libgomp.oacc-fortran/privatized-ref-2.f90 |  7 ++
 4 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr104717.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 25dde826146..43d59abe9e0 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -,7 +,9 @@ gfc_trans_oacc_construct (gfc_code *code)
   gfc_start_block ();
   oacc_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
 	code->loc, false, true);
+  pushlevel ();
   stmt = gfc_trans_omp_code (code->block->next, true);
+  stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   stmt = build2_loc (gfc_get_location (>loc), construct_code,
 		 void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (, stmt);
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr104717.f90 b/gcc/testsuite/gfortran.dg/goacc/pr104717.f90
new file mode 100644
index 000..4ef16187c84
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/pr104717.f90
@@ -0,0 +1,22 @@
+! Extracted from 'libgomp.oacc-fortran/privatized-ref-2.f90'.
+
+! { dg-additional-options "-O1 -fstack-arrays -fipa-pta" }
+
+program main
+  implicit none (type, external)
+  integer :: j
+  integer, allocatable :: A(:)
+
+  A = [(3*j, j=1, 10)]
+  call foo (A, size(A))
+  deallocate (A)
+contains
+  subroutine foo (array, nn)
+integer :: i, nn
+integer :: array(nn)
+
+!$acc parallel copyout(array)
+array = [(-i, i = 1, nn)]
+!$acc end parallel
+  end subroutine foo
+end
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
index 4dfeb7e07a2..13772c185ce 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
@@ -13,7 +13,7 @@
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
 ! so to maintain compatibility with earlier Tcl releases, we manually
 ! initialize counter variables:
-! { dg-line l_dummy[variable c_loop 0] }
+! { dg-line l_dummy[variable c_compute 0 c_loop 0] }
 ! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
 ! "WARNING: dg-line var l_dummy defined, but not used".
 
@@ -26,7 +26,7 @@ contains
 integer, parameter :: c = 3
 integer, external :: g
 
-!$acc parallel
+!$acc parallel ! { dg-line l_compute[incr c_compute] }
 !$acc loop collapse(2) private(a) private(x, y) ! { dg-line l_loop[incr c_loop] }
 do i = 1, 20
do j = 1, 25
@@ -46,6 +46,8 @@ contains
   y = a
end do
 end do
+!$acc end parallel
+! { dg-note {variable 'count\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute }
 ! { dg-note {variable 'count\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
@@ -54,6 +56,5 @@ contains
 ! { dg-note {variable 'y' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'y' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop }
-!$acc end parallel
   end subroutine f
 end module m
diff --git a/lib

Re: [Patch] Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

2022-03-10 Thread Thomas Schwinge
Hi Tobias!

On 2022-03-08T15:25:07+0100, Tobias Burnus  wrote:
> found when working on the deep-mapping patch* with OpenMP code
> (and part of that patch) but it already shows up in an existing
> OpenACC testcase. I think it makes sense to fix it already for GCC 12.
>
> Problem: Also for unallocated allocatables, their size was
> calculated - the 'if(desc.data == NULL)' check was only added
> for pointers.
>
> Result after the patch: When compiling with -O (which is the default
> for goacc.exp), the warning now disappears. Thus, I now use '-O0'
> and the previous "is uninitialized" is now "may be uninitialized".

I recently added that checking in
commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases", to document the status quo.

I'll leave it to you to decide what is more appropriate: (1), as you have
proposed, add '-O0' (but with source code comment, please); something
like:

 ! { dg-additional-options -Wuninitialized }
+! Trigger "may be used uninitialized".
+! { dg-additional-options -O0 }

..., or (2): update the test cases to simply reflect diagnostics that are
now (no longer) seen with (default) '-O' (rationale: the test cases
haven't originally been written for the '-Wuninitialized' diagnostics;
that's just tested additionally, and using '-O0' instead of '-O' may be
disturbing what they originally meant to test?), or (3): duplicate the
test cases to account for both (1) and (2) (in other words: write
dedicated test cases for your GCC/Fortran front end changes (for example,
based on the ones you've modified here), and for the existing test cases
apply (2)).  The latter, (3), would be my approach.

> Unrelated to the patch and the testcase, I added some
> 'allocate'**/'if(allocated())' to the testcase - as otherwise
> uninit vars would be accessed. (Not relevant for the warning
> or the patch - but I prefer no invalid code in testcases,
> if it can be avoided.)

Agreed in principle, but again: I don't know what these test cases
originally have been testing?

> OK for mainline?

I can't comment on the GCC/Fortran front end changes -- so unless
somebody else speaks up, that's an implicit approval for those, I
suppose.  ;-)

> Tobias
> * https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html

> ** I am actually not sure whether 'acc update(b)' will/should map a
> previous allocated variable - or whether it should.

(Are the typos here: in "will/should map": 's%map%update', and in
"or whether it should": 's%should%shouldn't'?)

> But that's
> unrelated to this bug fix. See also: https://gcc.gnu.org/PR96668
> for the re-mapping in OpenMP (works for arrays but not scalars).

I don't quickly dig that, sorry.  Do we need to first clarify that with
OpenACC Technical Committee, or is this just a GCC/OpenACC implementation
issue?


Grüße
 Thomas


> Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping
>
> gcc/fortran/ChangeLog:
>
>   * trans-openmp.cc (gfc_trans_omp_clauses, gfc_omp_finish_clause):
>   Obtain size for mapping only if allocatable array is allocated.
>
> gcc/testsuite/ChangeLog:
>
>   * gfortran.dg/goacc/array-with-dt-1.f90: Run with -O0 and
>   update dg-warning.
>   * gfortran.dg/goacc/pr93464.f90: Likewise.
>
>  gcc/fortran/trans-openmp.cc |  6 --
>  gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 | 12 +---
>  gcc/testsuite/gfortran.dg/goacc/pr93464.f90 |  8 
>  3 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
> index 4d56a771349..fad76a4791f 100644
> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -1597,7 +1597,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
> openacc)
>tree size = create_tmp_var (gfc_array_index_type);
>tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type));
>elemsz = fold_convert (gfc_array_index_type, elemsz);
> -  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
> +  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_ALLOCATABLE
> +   || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
> || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER_CONT)
>   {
> stmtblock_t cond_block;
> @@ -3208,7 +3209,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
>
> /* We have to check for n->sym->attr.dimension because
>of scalar coarrays.  */
> -   if (n->sym->attr.pointer && n->sym->attr.dimension)
> +   if ((n->sym->attr.pointer || n->sym->attr.allocatable)
> +   && n->sym->attr.dimension)
>   {
> stmtblock_t cond_block;
> tree size
> diff --git a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 
> 

[no subject]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-04T14:46:25+0100, I wrote:
> Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
> "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
> [PR100280, PR104132, PR104133]", see attached.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
> [...]
> @@ -27,8 +31,12 @@ int main()
>(volatile int *) 
>  #define N 123
>int b[N] = { 0 };
> +  unsigned long long f1;
> +  /*TODO See above.  */
> +  (volatile void *) 

Ah, the famous last-minute change just before 'git push'...  To work
around execution failure with GCN offloading, we're explicitly making
'f1' addressable here -- but I didn't realize that this also affects
diagnostics, sorry.

Pushed to master branch commit 14dfbb53594e164fe222476523a68039a8bd5252
"Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected
diagnostics", see attached.


Grüße
 Thomas


>
>  #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
> +  /* { dg-note {variable 'g2\.0' declared in block isn't candidate for 
> adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
> l_compute$c_compute } */
>{
> [...]
> +/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} 
> {} { target *-*-* } .+1 } */
> +  f1 = 1;
> +  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 
> 'parloops' for analysis} {} { target *-*-* } .+1 } */
> +#pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
> +  /* { dg-note {variable 'c' in 'private' clause is candidate for 
> adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c 
> } */
> +  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target 
> *-*-* } l_loop_c$c_loop_c } */
> +  for (c = 20; c > 0; --c)
> + f1 *= c;
> +
> +  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} 
> {} { target *-*-* } .+1 } */
> +  if (c != 234)
> + __builtin_abort ();
> +  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target 
> *-*-* } l_compute$c_compute } */
> +}
>}
> [...]
> +  assert (f1 == 243290200817664ULL);
>
>return 0;
>  }


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 14dfbb53594e164fe222476523a68039a8bd5252 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 4 Mar 2022 20:34:40 +0100
Subject: [PATCH] Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c'
 expected diagnostics

Fix-up for recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround
added just before commit: '(volatile void *) '.

	PR testsuite/104791
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix
	expected diagnostics.
---
 .../testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c   | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 049b3a44b03..985a547d381 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -37,6 +37,8 @@ int main()
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
   /* { dg-note {variable 'g2\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'f1\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'f1\.2' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
 /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
 int c = 234;
-- 
2.25.1



Test 'libgomp.oacc-*/kernels-private-vars-*' with '--param=openacc-kernels=decompose' [PR104784]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-04T14:46:25+0100, I wrote:
> Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
> "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
> [PR100280, PR104132, PR104133]", see attached.

Pushed to master branch commit e28eb86c18ed765dceb3c56471a848e9f0e120ff
"Test 'libgomp.oacc-*/kernels-private-vars-*' with
'--param=openacc-kernels=decompose' [PR104784]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e28eb86c18ed765dceb3c56471a848e9f0e120ff Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Feb 2022 22:24:03 +0100
Subject: [PATCH] Test 'libgomp.oacc-*/kernels-private-vars-*' with
 '--param=openacc-kernels=decompose' [PR104784]

Before recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine,
but 'libgomp.oacc-c++' testing ICEed.  Via the commit mentioned, the C++
testing ICEs are now resolved, but the underlying issue remains to be looked
into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences".

	PR middle-end/104784
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Test with '--param=openacc-kernels=decompose'.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
	Likewise.
---
 .../kernels-private-vars-local-worker-1.c | 23 +++
 .../kernels-private-vars-local-worker-2.c | 20 
 .../kernels-private-vars-local-worker-3.c | 20 
 .../kernels-private-vars-local-worker-4.c | 20 
 .../kernels-private-vars-local-worker-5.c | 20 
 .../kernels-private-vars-loop-gang-1.c| 11 ++---
 .../kernels-private-vars-loop-gang-2.c| 11 ++---
 .../kernels-private-vars-loop-gang-3.c| 11 ++---
 .../kernels-private-vars-loop-gang-4.c| 10 ++--
 .../kernels-private-vars-loop-gang

OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>>  - The "addressable" bit is set during the kernels conversion pass for
>>>variables that have "create" (alloc) clauses created for them in the
>>>synthesised outer data region (instead of in the front-end, etc.,
>>>where it can't be done accurately). Such variables actually have
>>>their address taken during transformations made in a later pass
>>>(omp-low, I think), but there's a phase-ordering problem that means
>>>the flag should be set earlier.
>>
>> The actual issue is a bit different, but yes, there is a problem.
>> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
>> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
>> didn't run into that with the OpenACC 'kernels' decomposition
>> originally.)  I've pushed to master branch
>> commit 9b32c1669aad5459dd053424f9967011348add83
>> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
>> clauses as addressable [PR100280]"

>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc
>> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
>> *body)
>>
>>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
>> location LOC containing BODY and having 'create (var)' clauses for each
>> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
>> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE 
>> set).
>> +   If INNER_CLEANUP is present, add a try-finally statement with
>> this cleanup code in the finally block.  Return the new data region, or
>> the original BODY if no data region was needed.  */
>>
>> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
>> *body,
>> inner_data_clauses = new_clause;
>>
>> prev_mapped_var = v;
>> +
>> +   /* See <https://gcc.gnu.org/PR100280>.  */
>> +   TREE_ADDRESSABLE (v) = 1;
>>   }
>>  }
>
> So, that's too simple.  ;-) [...]

> We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
> because that may easily violate GIMPLE invariants, leading to ICEs later.
> There are a few open PRs, which my following changes are addressing.  To
> make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
> 'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
> discussion of <https://gcc.gnu.org/PR102330>.

> I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
> 'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
> that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
> above (plus one or two additional ones to be introduced in later
> patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
> 'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
> 'make_addressable_vars' for later fix-up.

Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8935589b496f755e08cadf26d8ceddf0dd6e0968 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 23:31:34 +0100
Subject: [PATCH] OMP lowering: Regimplify
 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]

... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'.

Fix-up for commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	PR middle-end/104132
	PR middle-end/104133
	gcc/
	* omp-low.cc (task_shared_vars): Rename to
	'make_addressable_vars'.  Adjust all users.
	(scan_sharing_clauses)  Use it for
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompos

OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]", see attached.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See <https://gcc.gnu.org/PR100280>.  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

Pushed to master branch commit de6e81ea961219d0726db67776d11ce75a4cae1b
"OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into
OMP lowering [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From de6e81ea961219d0726db67776d11ce75a4cae1b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 23:03:49 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE'
 setting into OMP lowering [PR100280]

... in preparation for later changes.  No functional change.

Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
	* tree-core.h: Document it.
	* omp-low.cc (scan_sharing_clauses) : Handle
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
	'TREE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
---
 gcc/omp-low.cc| 31 +++
 gcc/omp-oacc-kernels-decompose.cc |  5 +--
 .../goacc/classify-kernels-unparallelized.c   |  3 +-
 .../c-c++-common/goacc/classify-kernels.c |  3 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  6 ++--
 .../goacc/kernels-decompose-pr100280-1.c  |  3 +-
 .../goacc/kernels-decompose-pr104061-1-2.c|  3 +-
 .../goacc/kernels-decompose-pr104061-1-3.c|  3 +-
 .../goacc/kernels-decompose-pr104061-1-4.c|  3 +-
 .../goacc/kernels-decompose-pr104132-1.c  |  3 +-
 .../goacc/kernels-decompose-pr104133-1.c  |  3 +-
 gcc/tree-core.h   |  3 ++
 gcc/tree.h  

Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]", see attached.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See <https://gcc.gnu.org/PR100280>.  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

Pushed to master branch commit e5ae22c56152b1a1f4b4e1d7ae04431a9e4710cc
"Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]'
declared in block made addressable" [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e5ae22c56152b1a1f4b4e1d7ae04431a9e4710cc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 16:54:30 +0100
Subject: [PATCH] Add diagnostic: "note: OpenACC 'kernels' decomposition:
 variable '[...]' declared in block made addressable" [PR100280]

Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
	'[...]' declared in block made addressable".
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
	'--param=openacc-privatization=noisy'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
---
 gcc/omp-oacc-kernels-decompose.cc | 24 ++-
 .../goacc/classify-kernels-unparallelized.c   |  7 ++
 .../c-c++-common/goacc/classify-kernels.c |  7 ++
 .../c-c++-common/goacc/kernels-decompose-2.c  |  2 ++
 .../goacc/kernels-decompose-pr100280-1.c  |  1 +
 .../goacc/kernels-decompose-pr104061-1-2.c|  1 +
 .../goacc/kernels-decompose-pr104061-1-3.c|  1 +
 .../goacc/kernels-decompose-pr104061-1-4.c|  1 +
 .../goacc/kernels-decompose-pr104132-1.c  |  1 +
 .../goacc/kernels-decompose-pr104133-1.c  |  1 +
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  3 +++
 .../kernels-decompose-1.c |  1 +
 12 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oac

Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c' [PR104133]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc

>> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
>> *body,
>> inner_data_clauses = new_clause;
>>
>> prev_mapped_var = v;
>> +
>> +   /* See <https://gcc.gnu.org/PR100280>.  */
>> +   TREE_ADDRESSABLE (v) = 1;
>>   }
>>  }
>
> So, that's too simple.  ;-) [...]

> We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
> because that may easily violate GIMPLE invariants, leading to ICEs later.
> There are a few open PRs

Pushed to master branch commit e085900fa10e28b684d656b66557d181247a1a48
"Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c' [PR104133]", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e085900fa10e28b684d656b66557d181247a1a48 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 19 Jan 2022 22:28:55 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c'
 [PR104133]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/104133
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: New file.
---
 .../goacc/kernels-decompose-pr104133-1.c  | 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
new file mode 100644
index 000..72dde346dbf
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.1 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels /* { dg-line l_compute1 } */
+  /* { dg-note {variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  {
+int k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k1 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+for (k = 0; k < 2; k++)
+  arr_0 += k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k2 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+for (k = 0; k < 2; k++)
+  arr_0 += k;
+  /* { dg-bogus {error: invalid operands in binary operation} {} { xfail *-*-* } .-1 } */
+  }
+}
-- 
2.34.1



Re: OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-03-01 Thread Thomas Schwinge
Hi!

Jakub, need your review/approval here, please:

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as 
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]"

> ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
>
> 13073 else if (is_gimple_reg (var))
> 13074   {
> 13075 gcc_assert (offloaded);
> 13076 tree avar = create_tmp_var (TREE_TYPE 
> (var));
> 13077 mark_addressable (avar);
>
> ..., which (a) is only implemented for actualy *offloaded* regions (but not
> data regions), and (b) the subsequently synthesized code for writing to and
> later reading back from the temporary fundamentally conflicts with OpenACC
> 'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
> to make work, so let's just avoid this case.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See .  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

So, that's too simple.  ;-) ... and gives rise to workaround patches like
we have on the og11 development branch:
  - "Avoid introducing 'create' mapping clauses for loop index variables in 
kernels regions",
  - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously",
  - "Fix for is_gimple_reg vars to 'data kernels'"

We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
because that may easily violate GIMPLE invariants, leading to ICEs later.
There are a few open PRs, which my following changes are addressing.  To
make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
discussion of .  (PR102330 turned out to be
unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a
different patch for it.)

I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
above (plus one or two additional ones to be introduced in later
patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
'make_addressable_vars' for later fix-up.

(In reply to Jakub Jelinek from comment #9)
> Whether you can use the same bitmap or need to add another bitmap next to
> task_shared_vars is something hard to guess without diving into it deeply.

Per my understanding of the code, the only place where I had doubts is
'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that
what this is doing is either a no-op in the
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the
original 'task_shared_vars' handling has been.  Either way: I couldn't
come up with a way (test case) that we'd actually run into this case;
you'd have to have the relevant OpenMP constructs inside an OpenACC
'kernels' region, which isn't permitted per
'gcc/omp-low.cc:check_omp_nesting_restrictions'.

OK to proceed in this way?


Grüße
 Thomas


--- gcc/omp-low.cc
+++ gcc/omp-low.cc
@@ -188,7 

Re: [Patch] Fortran/OpenMP: Fix depend-clause handling

2022-02-15 Thread Thomas Schwinge
Hi!

On 2022-02-15T11:26:12+0100, Tobias Burnus  wrote:
> As found by Marcel, the 'depend' clause was differently handled in
> 'omp depobj(...) depend(...)' and in 'omp task depend(...)'.

(Cross-referencing GCC PR104545 "[OpenMP & Fortran] Pointers issue in
combination of depobj construct and depend clause with depobj
dependence-type".)

> The attached patch [...]

>  gcc/fortran/trans-openmp.cc|  45 -
>  gcc/testsuite/gfortran.dg/gomp/depend-4.f90| 240 
> +
>  libgomp/testsuite/libgomp.fortran/depend-4.f90 | 107 +++

The actual commit r12-7242-g3939c1b11279dc950d2f160eb940dd791f7b40f1
"Fortran/OpenMP: Fix depend-clause handling" also has:

|  gcc/testsuite/gfortran.dg/gomp/depend-5.f90|  82 +

... (yay for more test cases!), and that one I see partially FAIL in
x86_64 '-m32'/'-mx32' testing:

FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\(\\*\\(integer\\(kind=16\\)\\[0:\\] \\* 
restrict\\) aaa.data\\)\\[aaa.offset \\+ 2\\]\\)" 1
FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\(\\*\\(integer\\(kind=16\\)\\[0:\\] \\* 
restrict\\) daaa->data\\)\\[daaa->offset \\+ 2\\]\\)" 1
FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\(\\*\\(integer\\(kind=16\\)\\[0:\\] \\* 
restrict\\) doaaa->data\\)\\[doaaa->offset \\+ 2\\]\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\(\\*daa\\)\\[1\\]\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\(\\*doaa\\)\\[1\\]\\)" 1
FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\(integer\\(kind=16\\) \\*\\) \\(aap.data 
\\+ \\(sizetype\\) \\(\\(aap.offset \\+ aap.dim\\[0\\].stride \\* 2\\) \\* 
aap.span\\)\\)\\)" 1
FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\(integer\\(kind=16\\) \\*\\) 
\\(daap->data \\+ \\(sizetype\\) \\(\\(daap->offset \\+ daap->dim\\[0\\].stride 
\\* 2\\) \\* daap->span\\)\\)\\)" 1
FAIL: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\(integer\\(kind=16\\) \\*\\) 
\\(doaap->data \\+ \\(sizetype\\) \\(\\(doaap->offset \\+ 
doaap->dim\\[0\\].stride \\* 2\\) \\* doaap->span\\)\\)\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\*dosa\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\*dosp\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\*dsa\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*\\*dsp\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*doss\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*dss\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*sa\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:\\*sp\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:aa\\[1\\]\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O   scan-tree-dump-times original 
"#pragma omp task depend\\(depobj:ss\\)" 1
PASS: gfortran.dg/gomp/depend-5.f90   -O  (test for excess errors)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed] libgomp.fortran/allocate-1.f90: Minor cleanup (was: Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).)

2022-02-04 Thread Thomas Schwinge
Hi Tobias!

On 2022-02-04T14:57:07+0100, Tobias Burnus  wrote:
> On 04.02.22 10:37, Thomas Schwinge wrote:
>>> I have attached a patch (not commited), which silences the three kind of
>>> warnings and fixes the interface issue.
>>> TODO: commit it.
>> Still "TODO: commit it" ;-) -- and while I haven't reviewed the changes
>> in detail, I did spot one item that should be addressed, I suppose:
>
> I had also spotted the 'stop' which was a left over from -fsanitized=...
> checking and had removed it locally.

Maybe removed locally, I can't tell ;-) -- but it's still in the commit
that you pushed.  See below.

Also, a commented-out '!$omp barrier'; not sure what that one is about.

> But good that you also keep
> checking patches :-)

I try!  :-)


Grüße
 Thomas


> In any case, I have now _finally_ committed the patch.
>
> Attached is the simplified (-w) diff, where I did exclude the
> indentation changes to make the diff more readable.
>
> For the full diff, see e.g. https://gcc.gnu.org/r12-7053
>
> Tobias

> commit 6d4981350168f1eb3f72149bd7e05b9ba6bec1fd
> Author: Tobias Burnus 
> Date:   Fri Feb 4 14:51:01 2022 +0100
>
> libgomp.fortran/allocate-1.f90: Minor cleanup
>
> libgomp/ChangeLog:
> * testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): 
> Renamed
> from is_64bit_aligned_.
> * testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl
> and use it, more implicit none, remove unused argument.
>
> diff --git a/libgomp/testsuite/libgomp.fortran/allocate-1.c 
> b/libgomp/testsuite/libgomp.fortran/allocate-1.c
> index d33acc6feef..cb6d355afc6 100644
> --- a/libgomp/testsuite/libgomp.fortran/allocate-1.c
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.c
> @@ -1,7 +1,7 @@
>  #include 
>
>  int
> -is_64bit_aligned_ (uintptr_t a)
> +is_64bit_aligned (uintptr_t a)
>  {
>return ( (a & 0x3f) == 0);
>  }
> diff --git a/libgomp/testsuite/libgomp.fortran/allocate-1.f90 
> b/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> index 35d1750b878..062278f9908 100644
> --- a/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> @@ -5,30 +5,30 @@
>  module m
>use omp_lib
>use iso_c_binding
> -  implicit none
> +  implicit none (type, external)
>
>interface
>  integer(c_int) function is_64bit_aligned (a) bind(C)
>import :: c_int
> -  integer  :: a
> +  type(*)  :: a
>  end
>end interface
> -end module m
>
> -subroutine foo (x, p, q, px, h, fl)
> +contains
> +
> +subroutine foo (x, p, q, h, fl)
>use omp_lib
>use iso_c_binding
>integer  :: x
>integer, dimension(4) :: p
>integer, dimension(4) :: q
> -  integer  :: px
>integer (kind=omp_allocator_handle_kind) :: h
>integer  :: fl
>
>integer  :: y
>integer  :: r, i, i1, i2, i3, i4, i5
>integer  :: l, l3, l4, l5, l6
> -  integer  :: n, n1, n2, n3, n4
> +  integer  :: n, n2, n3, n4
>integer  :: j2, j3, j4
>integer, dimension(4) :: l2
>integer, dimension(4) :: r2
> @@ -74,6 +74,8 @@ subroutine foo (x, p, q, px, h, fl)
>if (x /= 42) then
>  stop 1
>end if
> +
> +  !!$omp barrier
>v(1) = 7
>if ( (and(fl, 2) /= 0) .and.  &
> ((is_64bit_aligned(x) == 0) .or. &
> @@ -95,7 +97,7 @@ subroutine foo (x, p, q, px, h, fl)
>  stop 4
>end if
>!$omp end parallel
> -
> +stop
>!$omp teams
>!$omp parallel private (y) firstprivate (x, w) allocate (h: x, y, w)
>
> @@ -305,11 +307,13 @@ subroutine foo (x, p, q, px, h, fl)
>.or. r2(1) /= (5 * p(3)) .or. r2(4) /= (6 * p(3))) then
>  stop 25
>end if
> -
>  end subroutine
> +end module m
>
>  program main
>use omp_lib
> +  use m
> +  implicit none (type, external)
>integer, dimension(4) :: p
>integer, dimension(4) :: q
>
> @@ -323,11 +327,11 @@ program main
>if (a == omp_null_allocator) stop 1
>
>call omp_set_default_allocator (omp_default_mem_alloc);
> -  call foo (42, p, q, 2, a, 0);
> -  call foo (42, p, q, 2, omp_default_mem_alloc, 0);
> -  call foo (42, p, q, 2, a, 1);
> +  call foo (42, p, q, a, 0);
> +  call foo (42, p, q, omp_default_mem_alloc, 0);
> +  call foo (42, p, q, a, 1);
>call omp_set_default_allocator (a);
> -  call foo (42, p, q, 2, omp_null_allocator, 3);
> -  call foo (42, p, q, 2, omp_default_mem_alloc, 2);
> +  call foo (42, p, q, omp_null_allocator, 3);
> +  call foo (42, p, q, omp_default_mem_alloc, 2);
>call omp_destroy_allocator (a);
>  end
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-02-04 Thread Thomas Schwinge
Hi!

On 2022-01-31T19:13:09+, Hafiz Abid Qadeer  wrote:
> On 25/01/2022 10:32, Tobias Burnus wrote:
>> On 25.01.22 10:19, Thomas Schwinge wrote:
>>>> I am trying to figure out if the problem you observed
>>>> is a general one or just specific to fortran testcase.
>>> So, unless the '-fsanitize=thread' issues are bogus -- unlikely ;-) -- it
>>> seems a latent issue generally, now fatal with
>>> 'libgomp.fortran/allocate-1.f90'.
>>
>> There is one known issue with libgomp and TSAN (-fsanitize=thread)
>> that I tend to forget about :-(
>>
>> That's according to Jakub, who wrote a while ago:
>>
>> "TSAN doesn't understand what libgomp is doing, unless built with 
>> --disable-linux-futex"

Uh.  Anything that can reasonably be done to address this?  At least, to
make this obvious to the user of '-fsanitize=thread'?

>> However, I now tried to disable futex and still get the following.
>> (First result for libgomp.c-c++-common/allocate-1.c).
>>
>> On the other hand, I have the feeling that the configure option is
>> a no op for libgomp. This can also be seen in the configure.ac script,
>> which only for libstdc++ uses the result and the others have a no-op
>> call to 'true' (alias ':'):
>>
>> libgomp/configure.ac:GCC_LINUX_FUTEX(:)
>> libitm/configure.ac:GCC_LINUX_FUTEX(:)
>> libstdc++-v3/configure.ac:GCC_LINUX_FUTEX([AC_DEFINE(HAVE_LINUX_FUTEX, 1, 
>> [Define if futex syscall
>> is available.])])
>>
>> (The check is not completely pointless as some checks are still done;
>> e.g. 'SYS_gettid and SYS_futex required'.)

Uh.  That (make '--disable-linux-futex' work) should be fixed, I suppose?

>> (TSAN did find issues in libgomp in the past, however. But those
>> habe been fixed.)
>>
>>
>> Thus, there might or might not be an issue when TSAN reports one.
>>
>>  * * *
>>
>> Glancing at the Fortran testcase, I noted the following,
>> which probably does not cause the problems. But still,
>> I want to mention it:
>>
>>   !$omp parallel private (y, v) firstprivate (x) allocate (x, y, v)
>>   if (x /= 42) then
>> stop 1
>>   end if
>>
>>   v(1) = 7
>>   if ( (and(fl, 2) /= 0) .and.  &
>>((is_64bit_aligned(x) == 0) .or. &
>> (is_64bit_aligned(y) == 0) .or. &
>> (is_64bit_aligned(v(1)) == 0))) then
>>   stop 2
>>   end if
>>
>> If one compares this with the C/C++ testcase, I note that there
>> is a barrier before the alignment check in C/C++ but not in
>> Fortran. Additionally, 'v(1) = 7' is set twice and the
>> alignment check happens earlier than in C/C++. Not that that
>> should really matter, but I just saw it.
>>
>>
>> In C/C++:
>>   int v[x], w[x];
>> ...
>> v[0] = 7;
>> v[41] = 8;
>>
>> In Fortran:
>>   integer, dimension(x) :: v
>> ...
>>   v(1) = 7
>>   v(41) = 8
>>
>> where 'x == 42'. The Fortran version is not really wrong, but I think
>> the idea is to set the first and last array element - and that's here
>> v(42) and not v(41).
>>
>> BTW: Fortran permits to specify a different lower bound. When converting
>> C/C++ testcases, it can be useful to use the same lower bound also in
>> Fortran:   integer :: v(0:x-1)  (or: 'integer, dimension(0:x-1) :: v')
>> uses then 0 ... 41 for the indices instead of 1 ... 42.
>>
>> But one has to be careful as Fortran uses the upper bound and C uses the
>> number of elements. (Same with OpenMP array sections in Fortran vs. C.)

Abid, are you going to address these?  I think it does make sense if the
C/C++ and Fortran test cases match as much as feasible.

>> PS: The promised data-race warning:
>> ==
>> WARNING: ThreadSanitizer: data race (pid=4135381)
>>   Read of size 8 at 0x7ffc0888bdc0 by thread T10:
>> #0 foo._omp_fn.2 libgomp.c-c++-common/allocate-1.c:47 (a.out+0x402c05)
>> #1 gomp_thread_start ../../../repos/gcc/libgomp/team.c:129 
>> (libgomp.so.1+0x1e5ed)
>>
>>   Previous write of size 8 at 0x7ffc0888bdc0 by main thread:
>> #0 foo._omp_fn.1 libgomp.c-c++-common/allocate-1.c:47 (a.out+0x402aee)
>> #1 GOMP_teams_reg ../../../repos/gcc/libgomp/teams.c:51 
>> (libgomp.so.1+0x3638c)
>> #2 main libgomp.c-c++-common/allocate-1.c:366 (a.out+0x40273e)
>>
>>   Location is stack of main thread.
>>
>>   Location is global '' at 0x ([stack]+0x1ddc0)
>>
>>   Thread T10 (tid=4135398, running) created by main thread at:
>&

Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-02-04 Thread Thomas Schwinge
Hi Tobias!

On 2022-01-24T09:45:48+0100, Tobias Burnus  wrote:
> On 21.01.22 18:43, Tobias Burnus wrote:
>> On 21.01.22 18:15, Thomas Schwinge wrote:
>>> 11 | integer(c_int) function is_64bit_aligned (a) bind(C)
>>>  Warning: Variable ‘a’ at (1) is a dummy argument of the BIND(C)
>>> procedure ‘is_64bit_aligned’ but may not be C interoperable
>>> [-Wc-binding-type]
>>>
>>> Is that something to worry about?
> I have attached a patch (not commited), which silences the three kind of
> warnings and fixes the interface issue.
> TODO: commit it.

Still "TODO: commit it" ;-) -- and while I haven't reviewed the changes
in detail, I did spot one item that should be addressed, I suppose:

> --- a/libgomp/testsuite/libgomp.fortran/allocate-1.c
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.c
> @@ -1,7 +1,7 @@
>  #include 
>
>  int
> -is_64bit_aligned_ (uintptr_t a)
> +is_64bit_aligned (uintptr_t a)
>  {
>return ( (a & 0x3f) == 0);
>  }

> --- a/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> @@ -5,30 +5,30 @@
>  module m
>use omp_lib
>use iso_c_binding
> -  implicit none
> +  implicit none (type, external)
>
>interface
>  integer(c_int) function is_64bit_aligned (a) bind(C)
>import :: c_int
> -  integer  :: a
> +  type(*)  :: a
>  end
>end interface
> -end module m
>
> -subroutine foo (x, p, q, px, h, fl)
> +contains
> +
> +subroutine foo (x, p, q, h, fl)
>use omp_lib
>use iso_c_binding
>integer  :: x
>integer, dimension(4) :: p
>integer, dimension(4) :: q
> -  integer  :: px
>integer (kind=omp_allocator_handle_kind) :: h
>integer  :: fl
>
>integer  :: y
>integer  :: r, i, i1, i2, i3, i4, i5
>integer  :: l, l3, l4, l5, l6
> -  integer  :: n, n1, n2, n3, n4
> +  integer  :: n, n2, n3, n4
>integer  :: j2, j3, j4
>integer, dimension(4) :: l2
>integer, dimension(4) :: r2
> @@ -118,6 +118,7 @@ subroutine foo (x, p, q, px, h, fl)
>end if
>!$omp end parallel
>!$omp end teams
> +stop
>
>!$omp parallel do private (y) firstprivate (x)  reduction(+: r) allocate 
> (h: x, y, r, l, n) lastprivate (l)  linear (n: 16)
>do i = 0, 63

That early 'stop' should probably be backed out?  ;-)


Grüße
 Thomas


> @@ -153,77 +154,77 @@ subroutine foo (x, p, q, px, h, fl)
> ((is_64bit_aligned(l2(1)) == 0) .or. &
>  (is_64bit_aligned(l3) == 0) .or. &
>  (is_64bit_aligned(i1) == 0))) then
> - stop 10
> +stop 10
>end if
>  end do
>
>  !$omp do collapse(2) lastprivate(l4, i2, j2) linear (n2:17) allocate (h: 
> n2, l4, i2, j2)
>  do i2 = 3, 4
>do j2 = 17, 22, 2
> - n2 = n2 + 17
> - l4 = i2 * 31 + j2
> - if ( (and(fl, 1) /= 0) .and.  &
> -   ((is_64bit_aligned(l4) == 0) .or. &
> -   (is_64bit_aligned(n2) == 0) .or. &
> -   (is_64bit_aligned(i2) == 0) .or. &
> -   (is_64bit_aligned(j2) == 0))) then
> -   stop 11
> - end if
> +n2 = n2 + 17
> +l4 = i2 * 31 + j2
> +if ( (and(fl, 1) /= 0) .and.  &
> + ((is_64bit_aligned(l4) == 0) .or. &
> +  (is_64bit_aligned(n2) == 0) .or. &
> +  (is_64bit_aligned(i2) == 0) .or. &
> +  (is_64bit_aligned(j2) == 0))) then
> +  stop 11
> +end if
>end do
>  end do
>
>  !$omp do collapse(2) lastprivate(l5, i3, j3) linear (n3:17) schedule 
> (static, 3) allocate (n3, l5, i3, j3)
>  do i3 = 3, 4
>do j3 = 17, 22, 2
> -   n3 = n3 + 17
> -   l5 = i3 * 31 + j3
> -   if ( (and(fl, 2) /= 0) .and.  &
> -   ((is_64bit_aligned(l5) == 0) .or. &
> -   (is_64bit_aligned(n3) == 0) .or. &
> -   (is_64bit_aligned(i3) == 0) .or. &
> -   (is_64bit_aligned(j3) == 0))) then
> -   stop 12
> - end if
> +  n3 = n3 + 17
> +  l5 = i3 * 31 + j3
> +  if ( (and(fl, 2) /= 0) .and.  &
> + ((is_64bit_aligned(l5) == 0) .or. &
> +  (is_64bit_aligned(n3) == 0) .or. &
> +  (is_64bit_aligned(i3) == 0) .or. &
> +  (is_64bit_aligned(j3) == 0))) then
> +  stop 12
> +end if
>end do
>  end do
>
>  !$omp do collapse(2) lastprivate(l6, i4, j4) linear (n4:17) schedule 
> (dynamic) allocate (h: n4, l6, i4, j4)
>  do i4 = 3, 4
>do j4 = 17,

Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-21 Thread Thomas Schwinge
Hi Abid!

On 2022-01-11T22:31:54+, Hafiz Abid Qadeer  wrote:
> From d1fb55bff497a20e6feefa50bd03890e7a903c0e Mon Sep 17 00:00:00 2001
> From: Hafiz Abid Qadeer 
> Date: Fri, 24 Sep 2021 10:04:12 +0100
> Subject: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).
>
> This patch adds support for OpenMP 5.0 allocate clause for fortran. It does 
> not
> yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
> clause is already supported in C/C++.

> libgomp/ChangeLog:
>
>   * testsuite/libgomp.fortran/allocate-1.c: New test.
>   * testsuite/libgomp.fortran/allocate-1.f90: New test.

I'm seeing this test case randomly/non-deterministically FAIL to execute,
differently on different systems and runs, for example:

libgomp:
libgomp:
libgomp: Out of memory allocating 4 bytesOut of memory allocating 4 bytes
libgomp:
libgomp:
libgomp: Out of memory allocating 168 bytes

libgomp: Out of memory allocating 4 bytes

libgomp: Out of memory allocating 4 bytes

libgomp: Out of memory allocating 4 bytes

I'd assume there's some concurrency issue: the problem disappears if I
manually specify a lowerish 'OMP_NUM_THREADS', and conversely, on a
system where I don't normally see the FAILs, I can trigger them with a
largish 'OMP_NUM_THREADS', such as 'OMP_NUM_THREADS=18' and higher.

For example:

Thread 10 "a.out" hit Breakpoint 1, omp_aligned_alloc (alignment=4, size=4, 
allocator=6326576) at [...]/source-gcc/libgomp/allocator.c:318
318   if (allocator_data)
(gdb) print *allocator_data
$1 = {memspace = omp_default_mem_space, alignment = 64, pool_size = 8192, 
used_pool_size = 8188, fb_data = omp_null_allocator, sync_hint = 3, access = 7, 
fallback = 12, pinned = 0, partition = 15}

Given the high 'used_pool_size', is that to be expected, and the test
case shouldn't be requesting "so much" memory?  Or might the problem
actually be in 'libgomp/allocator.c' (not touched by your commit)?

All but Thread 10 are in 'gomp_team_barrier_wait_end' -- should memory
have been released at that point?

(gdb) thread apply 10 bt

Thread 10 (Thread 0x732e2700 (LWP 1601318)):
#0  omp_aligned_alloc (alignment=4, size=4, allocator=6326576) at 
[...]/source-gcc/libgomp/allocator.c:320
#1  0x7790b4db in GOMP_alloc (alignment=4, size=4, 
allocator=6326576) at [...]/source-gcc/libgomp/allocator.c:364
#2  0x00401f3f in foo_._omp_fn.3 () at 
source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:136
#3  0x778f31e6 in gomp_thread_start (xdata=) at 
[...]/source-gcc/libgomp/team.c:129
#4  0x7789e609 in start_thread (arg=) at 
pthread_create.c:477
#5  0x777c5293 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread apply 1 bt

Thread 1 (Thread 0x772ec1c0 (LWP 1601309)):
#0  futex_wait (val=96, addr=) at 
[...]/source-gcc/libgomp/config/linux/x86/futex.h:97
#1  do_wait (val=96, addr=) at 
[...]/source-gcc/libgomp/config/linux/wait.h:67
#2  gomp_team_barrier_wait_end (bar=, state=96) at 
[...]/source-gcc/libgomp/config/linux/bar.c:112
#3  0x00401f53 in foo_._omp_fn.3 () at 
source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:136
#4  0x778ea4f2 in GOMP_parallel (fn=0x401e6b , 
data=0x7fffd450, num_threads=18, flags=0) at 
[...]/source-gcc/libgomp/parallel.c:178
#5  0x004012ab in foo (x=42, p=..., q=..., px=2, h=6326576, fl=0) 
at source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:122
#6  0x004018e9 in MAIN__ () at 
source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:326

Manually compiling the test case, I see a lot of '-Wtabs' diagnostics
(can be ignored, I suppose), but also:

source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:11:47:

   11 | integer(c_int) function is_64bit_aligned (a) bind(C)
  |   1
Warning: Variable ‘a’ at (1) is a dummy argument of the BIND(C) procedure 
‘is_64bit_aligned’ but may not be C interoperable [-Wc-binding-type]

Is that something to worry about?

And:

source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:31:19:

   31 |   integer  :: n, n1, n2, n3, n4
  |   1
Warning: Unused variable ‘n1’ declared at (1) [-Wunused-variable]
source-gcc/libgomp/testsuite/libgomp.fortran/allocate-1.f90:18:27:

   18 | subroutine foo (x, p, q, px, h, fl)
  |   1
Warning: Unused dummy argument ‘px’ at (1) [-Wunused-dummy-argument]

For reference, quoting below the new Fortran test case.


Grüße
 Thomas


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.c
> @@ -0,0 +1,7 @@
> +#include 
> +
> +int
> +is_64bit_aligned_ (uintptr_t a)
> +{
> +  return ( (a & 0x3f) == 0);
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/allocate-1.f90
> @@ -0,0 +1,333 @@
> 

Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-14 Thread Thomas Schwinge
Hi Abid!

(Remember to CC  for 'gcc/fortran/' etc. changes.)


On 2022-01-11T22:31:54+, Hafiz Abid Qadeer  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-2.f90
> @@ -0,0 +1,45 @@
> +! { dg-do compile }
> +
> +module omp_lib_kinds
> +  use iso_c_binding, only: c_int, c_intptr_t
> +  implicit none
> +  private :: c_int, c_intptr_t
> +  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
> +
> +end module
> +
> +subroutine foo(x)
> +  use omp_lib_kinds
> +  implicit none
> +  integer  :: x
> +
> +  !$omp task allocate (x) ! { dg-error "'x' specified in 'allocate' clause 
> at .1. but not in an explicit privatization clause" }
> +  x=1
> +  !$omp end task
> +
> +  !$omp parallel allocate (x) ! { dg-error "'x' specified in 'allocate' 
> clause at .1. but not in an explicit privatization clause" }
> +  x=2
> +  !$omp end parallel
> +
> +  !$omp parallel allocate (x) shared (x) ! { dg-error "'x' specified in 
> 'allocate' clause at .1. but not in an explicit privatization clause" }
> +  x=3
> +  !$omp end parallel
> +
> +  !$omp parallel private (x) allocate (x) allocate (x) ! { dg-warning "'x' 
> appears more than once in 'allocate' clauses at .1." }
> +  x=4
> +  !$omp end parallel
> +
> +  !$omp parallel private (x) allocate (x, x) ! { dg-warning "'x' appears 
> more than once in 'allocate' clauses at .1." }
> +  x=5
> +  !$omp end parallel
> +
> +  !$omp parallel allocate (0: x) private(x) ! { dg-error "Expected integer 
> expression of the 'omp_allocator_handle_kind' kind at .1." }

We do for x86_64 default '-m64', but for '-m32' and '-mx32' compilation,
we're not seeing this latter diagnostic:

PASS: gfortran.dg/gomp/allocate-1.f90   -O  (test for excess errors)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for errors, line 16)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for errors, line 20)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for errors, line 24)
FAIL: gfortran.dg/gomp/allocate-2.f90   -O   (test for errors, line 36)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for errors, line 40)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for warnings, line 28)
PASS: gfortran.dg/gomp/allocate-2.f90   -O   (test for warnings, line 32)
PASS: gfortran.dg/gomp/allocate-2.f90   -O  (test for excess errors)

I suppose the reason is unintended congruence of data types?  Would it
work to make 'x' a floating-point data type, for example -- or is this
meant to explicitly check certain integer data type characteristics?


Grüße
 Thomas


> +  x=6
> +  !$omp end parallel
> +
> +  !$omp parallel private (x) allocate (0.1 : x) ! { dg-error "Expected 
> integer expression of the 'omp_allocator_handle_kind' kind at .1." }
> +  x=7
> +  !$omp end parallel
> +
> +end subroutine
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Thomas Schwinge
Hi Martin!

On 2022-01-13T09:06:16-0700, Martin Sebor  wrote:
> On 1/13/22 03:55, Thomas Schwinge wrote:
>> This has fallen out of (unfinished...) work earlier in the year: pushed
>> to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
>> "Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
>> for OpenACC test cases".
>
> Thanks for the heads up.  If any of these are recent regressions
> (either the false negatives or the false positives) it would be
> helpful to isolate them to a few representative test cases.
> The warning itself hasn't changed much in GCC 12 but regressions
> in it could be due to the jump threading changes that it tends to
> be sensitive to.

Ah, sorry for the ambiguity -- I don't think any of these are recent
regressions.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Document current '-Wuninitialized' diagnostics for 'libgomp.oacc-fortran/routine-10.f90' [PR102192]

2022-01-13 Thread Thomas Schwinge
Hi!

On 2022-01-13T11:55:03+0100, I wrote:
> This has fallen out of (unfinished...) work earlier in the year: pushed
> to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
> "Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
> for OpenACC test cases".

..., and commit 2edbcaed95b8d8cbb05a6af486179db0da6e3245
"Document current '-Wuninitialized' diagnostics for
'libgomp.oacc-fortran/routine-10.f90' [PR102192]".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2edbcaed95b8d8cbb05a6af486179db0da6e3245 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 26 Aug 2021 16:55:21 +0200
Subject: [PATCH] Document current '-Wuninitialized' diagnostics for
 'libgomp.oacc-fortran/routine-10.f90' [PR102192]

	libgomp/
	PR tree-optimization/102192
	* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
	'-Wuninitialized' diagnostics.
---
 .../testsuite/libgomp.oacc-fortran/routine-10.f90  | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
index 90cca7c1024..9290e90f970 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
@@ -1,5 +1,7 @@
 ! { dg-do run }
-!
+
+! { dg-additional-options -Wuninitialized }
+
 module m
   implicit none
 contains
@@ -26,6 +28,13 @@ contains
 
 call add_ps_routine(a, b, c)
   end function add_ef
+  ! This '-Wmaybe-uninitialized' diagnostic appears for '-O2' only; PR102192.
+  ! { dg-xfail-if PR102192 { *-*-* } { -O2 } }
+  ! There's another instance (again '-O2' only) further down, but as any number
+  ! of 'dg-xfail-if' only apply to the first 'dg-bogus' etc., we have no way to
+  ! XFAIL that other one, so we instead match all of them here (via line '0'):
+  ! { dg-bogus {'c' may be used uninitialized} {} { target *-*-* } 0 }
+  ! { TODO_dg-bogus {'c' may be used uninitialized} {} { target *-*-* } .-7 }
 end module m
 
 program main
@@ -44,6 +53,9 @@ program main
   do i = 1, n
  if (i .eq. 4) then
 c_a = add_ef(a_a, b_a)
+! See above.
+! { TODO_dg-xfail-if PR102192 { *-*-* } { -O2 } }
+! { TODO_dg-bogus {'c' may be used uninitialized} {} { target *-*-* } .-3 }
  end if
   end do
   !$acc end parallel
-- 
2.34.1



Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Thomas Schwinge
Hi!

This has fallen out of (unfinished...) work earlier in the year: pushed
to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 26 Aug 2021 16:55:21 +0200
Subject: [PATCH] Document current '-Wuninitialized'/'-Wmaybe-uninitialized'
 diagnostics for OpenACC test cases

... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23be5b5c55d572ced152897313ddb96ae
"Improve -Wuninitialized note location".

For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.

For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
	current '-Wuninitialized' diagnostics.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
	* c-c++-common/goacc/uninit-if-clause.c: Likewise.
	* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
	* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
	* gfortran.dg/goacc/derived-types-2.f90: Likewise.
	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/modules.f95: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
	* gfortran.dg/goacc/wait.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
	current '-Wuninitialized' diagnostics.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
	Likewise.
---
 .../goacc/builtin-goacc-parlevel-id-size.c|  8 +
 gcc/testsuite/c-c++-common/goacc/mdc-1.c  |  4 +++
 .../goacc/nested-reductions-1-kernels.c   | 11 ++
 .../goacc/nested-reductions-1-parallel.c  | 14 
 .../goacc/nested-reductions-1-routine.c   |  4 +++
 .../goacc/nested-reductions-2-kernels.c   | 11 ++
 .../goacc/nested-reductions-2-parallel.c  |

OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-01-13 Thread Thomas Schwinge
Hi!

On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>  - The "addressable" bit is set during the kernels conversion pass for
>variables that have "create" (alloc) clauses created for them in the
>synthesised outer data region (instead of in the front-end, etc.,
>where it can't be done accurately). Such variables actually have
>their address taken during transformations made in a later pass
>(omp-low, I think), but there's a phase-ordering problem that means
>the flag should be set earlier.

The actual issue is a bit different, but yes, there is a problem.
The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
"ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
didn't run into that with the OpenACC 'kernels' decomposition
originally.)  I've pushed to master branch
commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized data
clauses as addressable [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9b32c1669aad5459dd053424f9967011348add83 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 16 Dec 2021 22:02:37 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
 synthesized data clauses as addressable [PR100280]

... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

13073			else if (is_gimple_reg (var))
13074			  {
13075			gcc_assert (offloaded);
13076			tree avar = create_tmp_var (TREE_TYPE (var));
13077			mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown 
---
 gcc/omp-oacc-kernels-decompose.cc |   6 +-
 .../goacc/classify-kernels-parloops.c |  41 +++
 ...classify-kernels-unparallelized-parloops.c |  45 +++
 .../goacc/classify-kernels-unparallelized.c   |   5 +-
 .../c-c++-common/goacc/classify-kernels.c |   5 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  16 ++-
 .../goacc/kernels-decompose-ice-1.c   | 114 --
 .../goacc/kernels-decompose-ice-2.c   |  22 
 .../goacc/kernels-decompose-pr100280-1.c  |  19 +++
 .../goacc/classify-kernels-parloops.f95   |  43 +++
 ...assify-kernels-unparallelized-parloops.f95 |  47 
 .../goacc/classify-kernels-unparallelized.f95 |   5 +-
 .../gfortran.dg/goacc/classify-kernels.f95|   5 +-
 .../declare-vla-kernels-decompose-ice-1.c |   2 +-
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  53 
 .../kernels-decompose-1.c |   6 +-
 16 files changed, 264 insertions(+), 170 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100280-1.c
 create mode 100644 gcc/testsui

Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs)

2022-01-13 Thread Thomas Schwinge
Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> I've pushed to master branch [...] commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
> On 2019-02-01T00:59:30+0100, I wrote:
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

... and still is, but we're getting closer.

In preparation for a forthcoming ICE fix, I've pushed to master branch
commit 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b
"Enhance OpenACC 'kernels' decomposition testing", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 20 Dec 2021 16:14:46 +0100
Subject: [PATCH] Enhance OpenACC 'kernels' decomposition testing

	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
---
 .../c-c++-common/goacc/kernels-decompose-1.c  |  29 ++--
 .../c-c++-common/goacc/kernels-decompose-2.c  |  82 +++
 .../goacc/kernels-decompose-ice-1.c   |   7 +-
 .../goacc/kernels-decompose-ice-2.c   |   6 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  29 ++--
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  68 ++---
 .../declare-vla-kernels-decompose-ice-1.c |  14 ++
 .../declare-vla-kernels-decompose.c   |  23 
 .../libgomp.oacc-c-c++-common/declare-vla.c   |  16 +++
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c | 129 +-
 .../libgomp.oacc-c-c++-common/f-asyncwait-2.c |  70 --
 .../libgomp.oacc-c-c++-common/f-asyncwait-3.c |  59 ++--
 .../kernels-decompose-1.c |  14 +-
 .../libgomp.oacc-fortran/asyncwait-1.f90  |  86 ++--
 .../libgomp.oacc-fortran/asyncwait-2.f90  |  47 ++-
 .../libgomp.oacc-fortran/asyncwait-3.f90  |  47 ++-
 .../libgomp.oacc-fortran/pr94358-1.f90|  20 ++-
 17 files changed, 593 insertions(+), 153 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index f549cbadfa7..e58bc179f30 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,10 +1,16 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "-fdump-tree-gimple" } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
{ dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
aspects of that functionality.  */
 
@@ -14,7 +20,7 @@
passed to 'incr' may be unset, and in that case, it will be set to [...]",
so to maintain compatibility with earlier Tcl releases, we manually
initialize counter variables:
-   { dg-line l_dummy[variable c_loop_i 0] }
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
{ dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
"WARNING: dg-line var l_dummy defined, but not used".  */
 
@@ -28,36

Adjust 'gfortran.dg/goacc/privatization-1-*' [PR103576, PR103697] (was: [Patch] Fortran: Handle compare in OpenMP atomic)

2021-12-13 Thread Thomas Schwinge
Hi Tobias!

On 2021-12-13T12:19:50+0100, Tobias Burnus  wrote:
> Implement the 'compare' part in trans-openmp of OpenMP 5.1's atomic changes
> plus a couple of bugfixes throughout.

Hmm, I wonder why you didn't see the few regressions in your testing?
Pushed to master branch commit 228d64af4e244faabab5c47506920a1bde85d74e
"Adjust 'gfortran.dg/goacc/privatization-1-*' [PR103576, PR103697]", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 228d64af4e244faabab5c47506920a1bde85d74e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Dec 2021 07:03:52 +0100
Subject: [PATCH] Adjust 'gfortran.dg/goacc/privatization-1-*' [PR103576,
 PR103697]

... for the recent commit 494ebfa7c9aacaeb6ec1fccc47a0e49f31eb2bb8
"Fortran: Handle compare in OpenMP atomic", which changes the GIMPLE IR
such that a temporary is no longer used; 'original' dump:

 x = *a;
-{
-  integer(kind=4) D.4237;
-
-  D.4237 = *a;
   #pragma omp atomic relaxed
- = D.4237;
-}
+   = *a;
   }

(I'm not familiar to comment whether that's correct; but it appears that the
difference again disappears in later compiler passes.)

These OpenACC test cases verify behavior re OpenACC privatization levels, and
have to be adjusted accordingly.

	gcc/testsuite/
	PR fortran/103576
	PR testsuite/103697
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
---
 gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 | 1 -
 gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90  | 1 -
 .../gfortran.dg/goacc/privatization-1-routine_gang-loop.f90  | 1 -
 gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang.f90 | 1 -
 4 files changed, 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
index bcd7159ae5b..47ba5baf439 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
@@ -50,7 +50,6 @@ contains
 ! { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'y' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_loop$c_loop }
-! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
 ! { dg-note {variable 'y' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop }
 !$acc end parallel
   end subroutine f
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
index 31f998dfc92..4813e44a233 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
@@ -43,6 +43,5 @@ contains
 ! { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO3" { xfail *-*-* } l_compute$c_compute }
 ! { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO4" { xfail *-*-* } l_compute$c_compute }
 ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
-! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute }
   end subroutine f
 end module m
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
index db6d8226ed0..36f2a886e47 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
@@ -50,7 +50,6 @@ contains
 ! { dg-note {variable 'x' in 'private' clause isn't candidate for ad

Re: [gomp4] Make OpenACC orphan gang reductions errors

2021-11-30 Thread Thomas Schwinge
Hi!

On 2017-05-01T18:27:59-0700, Cesar Philippidis  wrote:
>   gcc/c/
>   * c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
>   gang reductions.
>
>   gcc/cp/
>   * semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
>   gang reductions.
>
>   gcc/fortran/
>   * openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
>   gang reductions.

As a follow-up, I've pushed to master branch
commit 77d24d43644909852998043335b5a0e09d1e8f02
'Consolidate OpenACC "gang reduction on an orphan loop" checking',
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 77d24d43644909852998043335b5a0e09d1e8f02 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 26 Nov 2021 12:29:26 +0100
Subject: [PATCH] Consolidate OpenACC "gang reduction on an orphan loop"
 checking

No need to implement separately in all front ends what we may implement in the
middle end, once for all.

Follow-up to preceding commit 2b7dac2c0dcb087da9e4018943c023c0678234a3
"Make OpenACC orphan gang reductions errors".

	gcc/
	* omp-offload.c (oacc_loop_process): Implement "gang reduction on
	an orphan loop" checking.
	gcc/c/
	* c-typeck.c (c_finish_omp_clauses): Remove "gang reduction on an
	orphan loop" checking.
	gcc/cp/
	* semantics.c (finish_omp_clauses): Remove "gang reduction on an
	orphan loop" checking.
	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Remove "gang reduction on
	an orphan loop" checking.
	(oacc_is_parallel, oacc_is_kernels, oacc_is_serial)
	(oacc_is_compute_construct): Remove.
	gcc/testsuite/
	* gfortran.dg/goacc/orphan-reductions-1.f90: Adjust.
---
 gcc/c/c-typeck.c  |  8 
 gcc/cp/semantics.c|  8 
 gcc/fortran/openmp.c  | 37 ---
 gcc/omp-offload.c | 20 --
 .../gfortran.dg/goacc/orphan-reductions-1.f90 |  8 ++--
 5 files changed, 20 insertions(+), 61 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index a025740e618..7524304f2bd 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14135,14 +14135,6 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
-	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
-	  && omp_find_clause (clauses, OMP_CLAUSE_GANG))
-	{
-	  error_at (OMP_CLAUSE_LOCATION (c),
-			"gang reduction on an orphan loop");
-	  remove = true;
-	  break;
-	}
 	  if (reduction_seen == 0)
 	reduction_seen = OMP_CLAUSE_REDUCTION_INSCAN (c) ? -1 : 1;
 	  else if (reduction_seen != -2
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index c84caf43251..cd1956497f8 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6667,14 +6667,6 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
 	  goto check_dup_generic;
 	case OMP_CLAUSE_REDUCTION:
-	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
-	  && omp_find_clause (clauses, OMP_CLAUSE_GANG))
-	{
-	  error_at (OMP_CLAUSE_LOCATION (c),
-			"gang reduction on an orphan loop");
-	  remove = true;
-	  break;
-	}
 	  if (reduction_seen == 0)
 	reduction_seen = OMP_CLAUSE_REDUCTION_INSCAN (c) ? -1 : 1;
 	  else if (reduction_seen != -2
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 7950c7fb43d..d120be81467 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -8322,31 +8322,6 @@ resolve_omp_do (gfc_code *code)
 }
 }
 
-static bool
-oacc_is_parallel (gfc_code *code)
-{
-  return code->op == EXEC_OACC_PARALLEL || code->op == EXEC_OACC_PARALLEL_LOOP;
-}
-
-static bool
-oacc_is_kernels (gfc_code *code)
-{
-  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
-}
-
-static bool
-oacc_is_serial (gfc_code *code)
-{
-  return code->op == EXEC_OACC_SERIAL || code->op == EXEC_OACC_SERIAL_LOOP;
-}
-
-static bool
-oacc_is_compute_construct (gfc_code *code)
-{
-  return (oacc_is_parallel (code)
-	  || oacc_is_kernels (code)
-	  || oacc_is_serial (code));
-}
 
 static gfc_statement
 omp_code_to_statement (gfc_code *code)
@@ -8650,18 +8625,6 @@ resolve_oacc_loop_blocks (gfc_code *code)
   if (!oacc_is_loop (code))
 return;
 
-  if (code->op == EXEC_OACC_LOOP
-  && code->ext.omp_clauses->lists[OMP_LIST_REDUCTION]
-  && code->ext.omp_clauses->gang)
-{
-  fortran_omp_context *c;
-  

Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2021-11-30 Thread Thomas Schwinge
Hi!

On 2020-07-20T12:26:48+0200, Frederik Harwath  wrote:
> Thomas Schwinge  writes:
>>> Can I include the patch in OG10?

> This has been delayed a bit by my vacation, but I have now committed
> the patch.

>> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
>> 'parallel'

> I have included the test cases for the "serial construct".

I've adapted the remaining relevant changes and pushed to master branch
commit c4f4c60457d1657cbd72015de3d818eb6462a0e9
'Re OpenACC "gang reduction on an orphan loop" error message', see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c4f4c60457d1657cbd72015de3d818eb6462a0e9 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jul 2020 11:24:21 +0200
Subject: [PATCH] Re OpenACC "gang reduction on an orphan loop" error message

Follow-up to preceding commit 2b7dac2c0dcb087da9e4018943c023c0678234a3
"Make OpenACC orphan gang reductions errors".

	gcc/fortran/
	* openmp.c (oacc_is_parallel_or_serial): Evolve into...
	(oacc_is_compute_construct): ... this function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.
	gcc/testsuite/
	* gfortran.dg/goacc/orphan-reductions-3.f90: New test
	verifying that the "gang reduction on an orphan loop" error message
	is not emitted for non-orphaned loops.
	* c-c++-common/goacc/orphan-reductions-3.c: Likewise for C and C++.

Co-Authored-By: Thomas Schwinge 
---
 gcc/fortran/openmp.c  |   9 +-
 .../c-c++-common/goacc/orphan-reductions-3.c  | 102 ++
 .../gfortran.dg/goacc/orphan-reductions-3.f90 |  89 +++
 3 files changed, 196 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-3.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-3.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b4100577e51..7950c7fb43d 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -8341,9 +8341,11 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_compute_construct (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return (oacc_is_parallel (code)
+	  || oacc_is_kernels (code)
+	  || oacc_is_serial (code));
 }
 
 static gfc_statement
@@ -8656,8 +8658,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !(oacc_is_parallel_or_serial (c->code)
-			 || oacc_is_kernels (c->code)))
+  if (c == NULL || !(oacc_is_compute_construct (c->code)))
 	gfc_error ("gang reduction on an orphan loop at %L", >loc);
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-3.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-3.c
new file mode 100644
index 000..cd8ad274ebb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-3.c
@@ -0,0 +1,102 @@
+/* Verify that the error message for gang reduction on orphaned OpenACC loops
+   is not reported for non-orphaned loops. */
+
+/* { dg-additional-options "-Wopenacc-parallelism" } */
+
+int
+kernels (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc kernels
+  {
+#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+  }
+  return s1 + s2;
+}
+
+int
+parallel (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc parallel
+  {
+#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+  }
+  return s1 + s2;
+}
+
+int
+serial (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc serial /* { dg-warning "region contains gang partitioned code but is not gang partitioned" } */
+  {
+#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+  }
+  return s1 + s2;
+}
+
+int
+serial_combined (int n)
+{
+  int i, s1 = 0, s2 = 

Re: [gomp4] Make OpenACC orphan gang reductions errors

2021-11-30 Thread Thomas Schwinge
Hi!

On 2017-05-01T18:27:59-0700, Cesar Philippidis  wrote:
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -6090,6 +6090,18 @@ resolve_oacc_loop_blocks (gfc_code *code)

> +  if (code->op == EXEC_OACC_LOOP
> +  && code->ext.omp_clauses->lists[OMP_LIST_REDUCTION]
> +  && code->ext.omp_clauses->gang)
> +{
> +  for (c = omp_current_ctx; c; c = c->previous)
> + if (!oacc_is_loop (c->code))
> +   break;
> +  if (c == NULL || !(oacc_is_parallel (c->code)
> +  || oacc_is_kernels (c->code)))
> +  gfc_error ("gang reduction on an orphan loop at %L", >loc);
> +}

To avoid erroneous diagnostics, we also need to handle the OpenACC
'serial' construct here.  I've adapted Kwok's relevant patch, and pushed
to master branch commit f1a58ab0db20c0862e8b5039bd448fc8c9799cac
"[OpenACC] Allow gang reductions inside serial constructs", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f1a58ab0db20c0862e8b5039bd448fc8c9799cac Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 13 Mar 2020 11:13:49 -0700
Subject: [PATCH] [OpenACC] Allow gang reductions inside serial constructs

... fixing a regression introduced in the preceding
commit 2b7dac2c0dcb087da9e4018943c023c0678234a3
"Make OpenACC orphan gang reductions errors".

	gcc/fortran/
	* openmp.c (oacc_is_serial, oacc_is_parallel_or_serial): New.
	(resolve_oacc_loop_blocks): Use oacc_is_parallel_or_serial instead of
	oacc_is_parallel.
	libgomp/
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Remove
	temporary skip.

Co-Authored-By: Thomas Schwinge 
---
 gcc/fortran/openmp.c   | 14 +-
 .../libgomp.oacc-fortran/parallel-dims.f90 |  1 -
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 4fa38691c01..b4100577e51 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -8334,6 +8334,18 @@ oacc_is_kernels (gfc_code *code)
   return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
 }
 
+static bool
+oacc_is_serial (gfc_code *code)
+{
+  return code->op == EXEC_OACC_SERIAL || code->op == EXEC_OACC_SERIAL_LOOP;
+}
+
+static bool
+oacc_is_parallel_or_serial (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code);
+}
+
 static gfc_statement
 omp_code_to_statement (gfc_code *code)
 {
@@ -8644,7 +8656,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !(oacc_is_parallel (c->code)
+  if (c == NULL || !(oacc_is_parallel_or_serial (c->code)
 			 || oacc_is_kernels (c->code)))
 	gfc_error ("gang reduction on an orphan loop at %L", >loc);
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
index 80d64030414..fad3d9d6a80 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
@@ -3,7 +3,6 @@
 
 ! { dg-additional-sources parallel-dims-aux.c }
 ! { dg-do run }
-  ! { dg-skip-if TODO { *-*-* } }
 ! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 ! { dg-additional-options "-fopt-info-note-omp" }
-- 
2.33.0



Re: [gomp4] Make OpenACC orphan gang reductions errors

2021-11-30 Thread Thomas Schwinge
Hi!

On 2017-05-01T18:27:59-0700, Cesar Philippidis  wrote:
> This patch promotes all OpenACC gang reductions on orphan loops as
> errors. Accord to the spec, orphan loops are those which are not
> lexically nested inside an OpenACC parallel or kernels regions. I.e.,
> acc loops inside acc routines.
>
> At first I thought this could be a warning because the gang reduction
> finalizer uses an atomic update. However, because there is no
> synchronization between gangs, there is way to guarantee that reduction
> will have completed once a single gang entity returns from the acc
> routine call.
>
> I've applied this patch to gomp-4_0-branch.

... which I've now adapted (with several things to be fixed in follow-up
commits) and pushed to master branch in
commit 2b7dac2c0dcb087da9e4018943c023c0678234a3
"Make OpenACC orphan gang reductions errors", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2b7dac2c0dcb087da9e4018943c023c0678234a3 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Mon, 1 May 2017 18:27:59 -0700
Subject: [PATCH] Make OpenACC orphan gang reductions errors

This patch promotes all OpenACC gang reductions on orphan loops as
errors. Accord to the spec, orphan loops are those which are not
lexically nested inside an OpenACC parallel or kernels regions. I.e.,
acc loops inside acc routines.

At first I thought this could be a warning because the gang reduction
finalizer uses an atomic update. However, because there is no
synchronization between gangs, there is way to guarantee that reduction
will have completed once a single gang entity returns from the acc
routine call.

	gcc/c/
	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan
	OpenACC gang reductions.
	gcc/cp/
	* semantics.c (finish_omp_clauses): Emit an error on orphan
	OpenACC gang reductions.
	gcc/fortran/
	* openmp.c (oacc_is_parallel, oacc_is_kernels): New 'static'
	functions.
	(resolve_oacc_loop_blocks): Emit an error on orphan OpenACC gang
	reductions.
	gcc/
	* omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum.
	* omp-low.c (lower_oacc_head_mark): Use it to mark OpenACC
	reductions.
	* omp-offload.c (oacc_loop_auto_partitions): Don't assign gang
	level parallelism to orphan reductions.
	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-1-routine.c: Adjust.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* gcc.dg/goacc/loop-processing-1.c: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/orphan-reductions-2.c: New test.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.
	* gfortran.dg/goacc/orphan-reductions-2.f90: New test.
	libgomp/
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Temporarily
	skip.

Co-Authored-By: Thomas Schwinge 
---
 gcc/c/c-typeck.c  |   8 +
 gcc/cp/semantics.c|   8 +
 gcc/fortran/openmp.c  |  24 ++
 gcc/omp-general.h |   3 +-
 gcc/omp-low.c |   4 +
 gcc/omp-offload.c |   7 +
 .../goacc/nested-reductions-1-routine.c   |   3 +
 .../goacc/nested-reductions-2-routine.c   |   9 +
 .../c-c++-common/goacc/orphan-reductions-1.c  |  56 +
 .../c-c++-common/goacc/orphan-reductions-2.c  |  87 
 .../gcc.dg/goacc/loop-processing-1.c  |   2 +-
 .../goacc/nested-reductions-1-routine.f90 |   3 +
 .../goacc/nested-reductions-2-routine.f90 |   9 +
 .../gfortran.dg/goacc/orphan-reductions-1.f90 | 206 ++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 |  89 
 .../libgomp.oacc-fortran/parallel-dims.f90|   1 +
 16 files changed, 517 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 7524304f2bd..a025740e618 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14135,6 +14135,14 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
+	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
+	  && omp_find_clause (clauses, OMP_CLAUSE_GANG))
+	{
+	  error_at (OMP_CLAUSE_LOCATION (c),
+			"gang reduction on an orphan loop");
+	  remove = tru

Re: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-28 Thread Thomas Schwinge
Hi!

On 2021-09-27T14:07:53+0200, Tobias Burnus  wrote:
> now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0


> Conclusion: Reviews are very helpful :-)

Ha!  :-) (... and I wasn't even involed here!)  ;-P


As testing showed here:

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
> @@ -0,0 +1,68 @@
> +/* Called by assumed_rank_22.f90.  */

> +  if (num == 0)
> +assert (x->dim[2].extent == -1);
> +  else if (num == 20)
> +assert (x->dim[2].extent == 1);
> +  else if (num == 40)
> +{
> +  /* FIXME: - dg-output = 'c_assumed ... OK' checked in .f90 file. */
> +  /* assert (x->dim[2].extent == 0); */
> +  if (x->dim[2].extent == 0)
> + __builtin_printf ("c_assumed - 40 - OK\n");
> +  else
> + __builtin_printf ("ERROR: c_assumed num=%d: "
> +   "x->dim[2].extent = %d != 0\n",
> +   num, x->dim[2].extent);
> +}
> +  else if (num == 60)
> +assert (x->dim[2].extent == 2);
> +  else if (num == 80)
> +assert (x->dim[2].extent == 2);
> +  else if (num == 100)
> +{
> +  /* FIXME: - dg-output = 'c_assumed ... OK' checked in .f90 file. */
> +  /* assert (x->dim[2].extent == 0); */
> +  if (x->dim[2].extent == 0)
> + __builtin_printf ("c_assumed - 100 - OK\n");
> +  else
> + __builtin_printf ("ERROR: c_assumed num=%d: "
> +   "x->dim[2].extent = %d != 0\n",
> +   num, x->dim[2].extent);
> +}
> +  else
> +assert (0);

... the 'ERROR:' prefixes printed do confuse DejaGnu...  As obvious,
pushed to master branch commit 95540a6d1d7b29cdd3ed06fbcb07465804504cfd
"'gfortran.dg/assumed_rank_22_aux.c' messages printed vs. DejaGnu", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 95540a6d1d7b29cdd3ed06fbcb07465804504cfd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 Sep 2021 09:02:56 +0200
Subject: [PATCH] 'gfortran.dg/assumed_rank_22_aux.c' messages printed vs.
 DejaGnu

Print lower-case 'error: [...]' instead of upper-case 'ERROR: [...]', to not
confuse the DejaGnu log processing harness into thinking these are DejaGnu
harness ERRORs:

Running /scratch/tschwing/build2-trusty-cs/gcc/build/submit-big/source-gcc/gcc/testsuite/gfortran.dg/dg.exp ...
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
+ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
[...]

Fix-up for recent commit 00f6de9c69119594f7dad3bd525937c94c8200d0
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]".

	gcc/testsuite/
	* gfortran.dg/assumed_rank_22_aux.c: Adjust messages printed.
---
 gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
index 2fbf83d649a..e5fe02135e9 100644
--- a/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
+++ b/gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c
@@ -29,7 +29,7 @@ c_assumed (CFI_cdesc_t *x, int num)
   if (x->dim[2].extent == 0)
 	__builtin_printf ("c_assumed - 40 - OK\n");
   else
-	__builtin_printf ("ERROR: c_assumed num=%d: "
+	__builtin_printf ("error: c_assumed num=%d: "
 		  "x->dim[2].extent = %d != 0\n",
 		  num, x->dim[2].extent);
 }
@@ -44,7 +44,7 @@ c_assumed (CFI_cdesc_t *x, int num)
   if (x->dim[2].extent == 0)
 	__builtin_printf ("c_assumed - 100 - OK\n");
   else
-	__builtin_printf ("ERROR: c_assumed num=%d: "
+	__builtin_printf ("error: c_assumed num=%d: "
 		  "x->dim[2].extent = %d != 0\n",
 		  num, x->dim[2].extent);
 }
-- 
2.33.0



Re: [committed] libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note (was: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070])

2021-09-28 Thread Thomas Schwinge
Hi!

On 2021-09-27T14:38:56+0200, Tobias Burnus  wrote:
> On 27.09.21 14:07, Tobias Burnus wrote:
>> now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0
>
> I accidentally changed dg-note to dg-message when updating the expected
> output, as the dump has changed. (Copying seemingly the sorry line
> instead of the dg-note lines as template.)

Strange.  ;-P

> Changed back to dg-note & committed as
> r12-3898-gda1f6391b7c255e4e2eea983832120eff4f7d3df.

As shown by offloading testing, a bit more is necessary here; I've
pushed to master branch commit a43ae03a053faad871e6f48099d21e64b8e316cf
'Further test case adjustment re "Fortran: Fix assumed-size to
assumed-rank passing"', see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a43ae03a053faad871e6f48099d21e64b8e316cf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 Sep 2021 08:05:28 +0200
Subject: [PATCH] Further test case adjustment re "Fortran: Fix assumed-size to
 assumed-rank passing"

Fix-up for recent commit 00f6de9c69119594f7dad3bd525937c94c8200d0
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]",
and commit da1f6391b7c255e4e2eea983832120eff4f7d3df
"libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note".

Due to use of '#if !ACC_MEM_SHARED' conditionals in
'libgomp.oacc-fortran/if-1.f90', 'target { !  openacc_host_selected }'
needs some special care (ignoring the pre-existing mismatch of
'ACC_MEM_SHARED' vs. 'openacc_host_selected').

As seen with GCN offloading, we need to revert to another bit of the
original code in 'libgomp.oacc-fortran/privatized-ref-2.f90'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
---
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 | 6 ++
 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
index 3089d6a0c43..9eadfcf9738 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -394,6 +394,7 @@ program main
 
   !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target { ! openacc_host_selected } } .-1 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-2 }
 
 #if !ACC_MEM_SHARED
   if (acc_is_present (a) .eqv. .TRUE.) STOP 21
@@ -408,6 +409,7 @@ program main
   !$acc data copyin (a(1:N)) if (1 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
   ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-3 }
 
 #if !ACC_MEM_SHARED
 if (acc_is_present (a) .eqv. .FALSE.) STOP 23
@@ -416,6 +418,7 @@ program main
 !$acc data copyout (b(1:N)) if (0 == 1)
 ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
 ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
+! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-3 }
 #if !ACC_MEM_SHARED
   if (acc_is_present (b) .eqv. .TRUE.) STOP 24
 #endif
@@ -864,6 +867,7 @@ program main
 
   !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target { ! openacc_host_selected } } .-1 }
+  ! { dg-note {variable 'parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target { ! openacc_host_selected } } .-2 }
 
 #if !ACC_MEM_SHARED
   if (acc_is_present (a) .eqv. .TRUE.) STOP 56
@@ -878,6 +882,7 @@ program main
   !$acc data copyin (a(1:N)) if (1 == 1)
   ! { dg-note {variable 'D\.[0-9]+' declared 

Re: [Patch] Fortran: Improve -Wmissing-include-dirs warnings [PR55534]

2021-09-22 Thread Thomas Schwinge
Hi Tobias!

On 2021-09-21T21:22:44+0200, Tobias Burnus  wrote:
> While the previous patch fixed -Wno-missing-include-dirs and sorted
> out some inconsistencies with libcpp warnings, it had two issues:
>
> * Some superfluous warnings were printed, e.g. for
>  gfortran nonexisting/file.f90
>there was a warning about include path "nonexisting" not existing
>and twice the error that the "nonexisting/file.f90" could not be
>read.
>
> * At least as invoked when build GCC or when running the GCC testsuite,
>the passed -B -isystem etc. arguments lead to proper but pointless
>diagnostic about 'finclude' or other directories not being found,
>causing excess-error FAILS and -Werror build fails.

ACK, I too had seen those (with '-cpp' only?), but not yet reported.

> While the latter could be fixed by adding -Wno-missing-include-dirs,
> it still felt like the wrong approach.

ACK, and also for the ones you already had added.  ;-)

> While the testsuite does run for me, others reported that they do
> see missing-include-dirs warnings. Instead of adding a bunch of
> -Wno-missing-include-dirs to the test config, I now only warn for
> -I and -J by default (similar to previous state) and only do a full
> warnings when the user requested passes the -Wmissing-include-dirs
> explicitly. The Fortran behavior is now also properly documented
> in the manual.

ACK, such an approach seems reasonable to me.  (But I can't formally
approve, of course.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed] Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]

2021-09-15 Thread Thomas Schwinge
ot; }
> +
> +!$omp end SCOPE  ! { dg-error "Unexpected !.OMP END SCOPE" }
> +
> +!$omp end SECTIONS  ! { dg-error "Unexpected !.OMP END SECTIONS" }
> +
> +!$omp end SIMD  ! { dg-error "Unexpected !.OMP END SIMD" }
> +
> +!$omp end SINGLE  ! { dg-error "Unexpected !.OMP END SINGLE" }
> +
> +!$omp end TARGET  ! { dg-error "Unexpected !.OMP END TARGET" }
> +
> +!$omp end TARGET DATA  ! { dg-error "Unexpected !.OMP END TARGET DATA" }
> +
> +!$omp end TARGET PARALLEL  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL" }
> +
> +!$omp end TARGET PARALLEL DO  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL DO" }
> +
> +!$omp end TARGET PARALLEL DO SIMD  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL DO SIMD" }
> +
> +!$omp end TARGET PARALLEL LOOP  ! { dg-error "Unexpected junk" }
> +
> +!$omp end TARGET SIMD  ! { dg-error "Unexpected !.OMP END TARGET SIMD" }
> +
> +!$omp end TARGET TEAMS  ! { dg-error "Unexpected !.OMP END TARGET TEAMS" }
> +
> +!$omp end TARGET TEAMS DISTRIBUTE  ! { dg-error "Unexpected !.OMP END TARGET 
> TEAMS DISTRIBUTE" }
> +
> +!$omp end TARGET TEAMS DISTRIBUTE PARALLEL DO  ! { dg-error "Unexpected 
> !.OMP END TARGET TEAMS DISTRIBUTE PARALLEL DO" }
> +
> +!$omp end TARGET TEAMS DISTRIBUTE PARALLEL DO SIMD  ! { dg-error "Unexpected 
> !.OMP END TARGET TEAMS DISTRIBUTE PARALLEL DO SIMD" }
> +
> +!$omp end TARGET TEAMS DISTRIBUTE SIMD  ! { dg-error "Unexpected !.OMP END 
> TARGET TEAMS DISTRIBUTE SIMD" }
> +
> +!$omp end TARGET TEAMS LOOP  ! { dg-error "Unexpected junk" }
> +
> +!$omp end TASK  ! { dg-error "Unexpected !.OMP END TASK" }
> +
> +!$omp end TASKGROUP  ! { dg-error "Unexpected !.OMP END TASKGROUP" }
> +
> +!$omp end TASKLOOP  ! { dg-error "Unexpected !.OMP END TASKLOOP" }
> +
> +!$omp end TASKLOOP SIMD  ! { dg-error "Unexpected !.OMP END TASKLOOP SIMD" }
> +
> +!$omp end TEAMS  ! { dg-error "Unexpected !.OMP END TEAMS" }
> +
> +!$omp end TEAMS DISTRIBUTE  ! { dg-error "Unexpected !.OMP END TEAMS 
> DISTRIBUTE" }
> +
> +!$omp end TEAMS DISTRIBUTE PARALLEL DO  ! { dg-error "Unexpected !.OMP END 
> TEAMS DISTRIBUTE PARALLEL DO" }
> +
> +!$omp end TEAMS DISTRIBUTE PARALLEL DO SIMD  ! { dg-error "Unexpected !.OMP 
> END TEAMS DISTRIBUTE PARALLEL DO SIMD" }
> +
> +!$omp end TEAMS DISTRIBUTE SIMD  ! { dg-error "Unexpected !.OMP END TEAMS 
> DISTRIBUTE SIMD" }
> +
> +!$omp end TEAMS LOOP  ! { dg-error "Unexpected junk" }
> +
> +!$omp end WORKSHARE  ! { dg-error "Unexpected !.OMP END WORKSHARE" }
> +
> +end  ! { dg-error "Unexpected END statement" }
> +
> +! { dg-excess-errors "Unexpected end of file" }


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8b69c481fc86e04c6c83f3a49eef2760c175a8f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Sep 2021 10:25:53 +0200
Subject: [PATCH] Add OpenACC 'host_data' testing to
 'gfortran.dg/goacc/unexpected-end.f90'

Use underscore instead of space in 'host_data'.

Follow-up to recent commit 33fdbbe4ce6055eb858096d01720ccf94aa854ec
"Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]".

	gcc/testsuite/
	* gfortran.dg/goacc/unexpected-end.f90: Add OpenACC 'host_data'
	testing.
---
 gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90 b/gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90
index 442724fea83..e9db47b3270 100644
--- a/gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90
@@ -4,7 +4,7 @@
 
 !$acc end DATA  ! { dg-error "Unexpected !.ACC END DATA" }
 
-!$acc end HOST DATA  ! { dg-error "Unclassifiable OpenACC directive" }
+!$acc end HOST_DATA  ! { dg-error "Unexpected !.ACC END HOST_DATA" }
 
 !$acc end KERNELS  ! { dg-error "Unexpected !.ACC END KERNELS" }
 
@@ -20,4 +20,6 @@
 
 !$acc end SERIAL LOOP  ! { dg-error "Unexpected !.ACC END SERIAL LOOP" }
 
+!$acc end EUPHORBIA LATHYRIS  ! { dg-error "Unclassifiable OpenACC directive" }
+
 end
-- 
2.33.0



  1   2   >