OpenMP Patch Ping – including "[13 Regression]" patches

2023-01-28 Thread Tobias Burnus

"[13 Regression]" OpenMP Fortran patches:

[Patch] OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP 
[PR108512]
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610531.html

[Patch][v2] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610584.html


Additionally, there several more patches pending, see below. Of those:

The first two small ones are very simple; especially the first one I
regard as obvious! The third one is a documentation patch.

The others are of varying complexity but I think some would be still suitable
for the current stage, including some which have been pinged since October :-(

Tobias

PS: The mentioned patches:

On 10.01.23 12:37, Tobias Burnus wrote:

Hi all, hello Jakub,

Below is the updated list to last ping,
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607178.html

NOTE to the list below: I have stopped checking older patches. I know
some more are pending review, others need to be revised. I will re-check,
once the below listed patches have been reviewed. Cf. old list.

Thanks for the reviews done in between the last ping and now!

 * * *

Small patches
=

* [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html
  Tue Dec 13 16:38:22 GMT 2022

* [Patch] OpenMP: Parse align clause in allocate directive in C/C++
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html
  Tue Dec 13 17:44:27 GMT 2022

* Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch]
libgomp: Handle OpenMP's reverse offloads)
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608245.html
  Thu Nov 24 12:01:04 GMT 2022

(Side note: wwwdocs also needs to be updated for the latter patch and
some other patches done in the meanwhile.)


Fortran allocat(e,ors) prep patch
=

* [Patch] Fortran/OpenMP: Add parsing support for allocators/allocate
directive (was: [Patch] Fortran/OpenMP: Add parsing support for
allocators directive)
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html
  Wed Dec 21 15:51:25 GMT 2022

(Remark: While written from scratch, it is kind of a follow-up to
Abid's patch
   [PATCH 1/5] [gfortran] Add parsing support for allocate directive
(OpenMP 5.0)
you/Jakub reviewed on Tue Oct 11 12:13:14 GMT 2022, i.e.
 https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html
- For the actual implementation of 'allocators', we still have to
solve the issues
  raised in the review for '[PATCH 2/5] [gfortran] Translate allocate
directive (OpenMP 5.0).'.
  at
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603279.html
(and earlier in the thread);
  implementing 'omp allocate' (Fortran/C/C++) seems to be easier but
no one has started implementing
  it so far - only parsing support exists.
- The USM patches on semi-USM system run into a similar issue as
'allocators' and for it, some
  ME omp_allocate is added.)


Mapping related patches
===
(Complex but GCC needs a revision badly as it fixing several bugs and
missing functionality.)

* Complete patch set was just re-submitted by Julian, overiew patch is
  [PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran
"declare mapper" support
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031
  Fri Dec 23 12:12:53 GMT 2022
* Note: For 10/11 of the set, there was a follow up this Monday
  [PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper"
directives for C
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html

[As it relates to one patch in the series:
  '[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and
array descr'
That's mine, needs to be updated (WIP) and fixes array
descriptor/alloc-string-length var
issues, where descriptor/string length may need to be handled
explicitly on data entering map,
i.e. string lengths/allocator may require 'to:' instead of 'alloc:' -
and on data exit mapping,
the current code might add a bogus 'alloc:'. - Idea is to handle this
explicitly
in fortran/trans-openmp.cc instead of auto-adding it in the ME.
Status: WIP - removed in ME but not all cases are handled yet in FE.)


Fortran deep mapping (allocatable components)

(Old patch of March 2022, but first part now properly but belated
submitted - today):
[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for
Fortran deep mapping of DT
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609637.html

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]

2023-01-27 Thread Tobias Burnus

Rather obvious fix. Hence, I intent to commit it later as obvious,
unless there are any comments.

Tobias

PS: Thanks goes to Thomas for finding + reporting the issue.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]

gcc/fortran/ChangeLog:

	PR fortran/108558
	* trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.

libgomp/ChangeLog:

	PR fortran/108558
	* testsuite/libgomp.fortran/has_device_addr.f90: New test.

 gcc/fortran/trans-openmp.cc|  2 +
 .../testsuite/libgomp.fortran/has_device_addr.f90  | 59 ++
 2 files changed, 61 insertions(+)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 87213de0918..5283d0ce5f3 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -6205,6 +6205,8 @@ gfc_split_omp_clauses (gfc_code *code,
 	= code->ext.omp_clauses->lists[OMP_LIST_MAP];
 	  clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_IS_DEVICE_PTR]
 	= code->ext.omp_clauses->lists[OMP_LIST_IS_DEVICE_PTR];
+	  clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_HAS_DEVICE_ADDR]
+	= code->ext.omp_clauses->lists[OMP_LIST_HAS_DEVICE_ADDR];
 	  clausesa[GFC_OMP_SPLIT_TARGET].device
 	= code->ext.omp_clauses->device;
 	  clausesa[GFC_OMP_SPLIT_TARGET].thread_limit
diff --git a/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
new file mode 100644
index 000..95cc7788f2d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
@@ -0,0 +1,59 @@
+! { dg-additional-options "-fdump-tree-original" }
+
+!
+! PR fortran/108558
+!
+
+! { dg-final { scan-tree-dump-times "#pragma omp target has_device_addr\\(x\\) has_device_addr\\(y\\)" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\)" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target update from\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\) use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp teams" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp distribute" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp parallel" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp for nowait" 2 "original" } }
+
+module m
+contains
+subroutine vectorAdd(x, y, N)
+  implicit none
+  integer :: N
+  integer(4) :: x(N), y(N)
+  integer :: i
+
+  !$omp target teams distribute parallel do has_device_addr(x, y)
+  do i = 1, N
+y(i) = x(i) + y(i)
+  end do
+end subroutine vectorAdd
+end module m
+
+program main
+  use m
+  implicit none
+  integer, parameter :: N = 9876
+  integer(4) :: x(N), y(N)
+  integer :: i
+
+  x(:) = 1
+  y(:) = 2
+
+  !$omp target data map(x, y)
+!$omp target data use_device_addr(x, y)
+  call vectorAdd(x, y, N)
+!$omp end target data
+!$omp target update from(y)
+if (any (y /= 3)) error stop
+  !$omp end target data
+
+  x = 1
+  y = 2
+  !$omp target data map(x, y) use_device_addr(x, y)
+!$omp target teams distribute parallel do has_device_addr(x, y)
+do i = 1, N
+  y(i) = x(i) + y(i)
+end do
+ !$omp end target data
+ if (any (y /= 3)) error stop
+end program


[committed] gomp/declare-variant-1*.f90: Update for Windows

2023-01-27 Thread Tobias Burnus

Tested on x86_64-gnu-linux with -m32 and -m64. It was discussed on
#gfortran IRC and tested with MinGW64 with/by nightstrike.

Committed to mainline.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit d1e0575fdc9216f96c4f88f9f41a25b854300c0b
Author: Tobias Burnus 
Date:   Fri Jan 27 09:13:16 2023 +0100

gomp/declare-variant-1*.f90: Update for Windows

Replace target selector 'lp64' by '! ilp32' to handle
Windows which uses 32bit long (and vice versa for '! lp64').

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-10.f90: Update scan-tree's
target selector to handle Windows.
* gfortran.dg/gomp/declare-variant-11.f90: Likewise.
* gfortran.dg/gomp/declare-variant-12.f90: Likewise.

diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
index d6d2c8c262b..2f09146a10d 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
@@ -72,2 +72,2 @@ contains
-  call f04 ()	! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-			! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ! lp64 } || { ! { i?86-*-* x86_64-*-* } } } } } }
+  call f04 () ! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+  ! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ilp32 } || { ! { i?86-*-* x86_64-*-* } } } } } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
index 60aa0fcb3b0..3593c9a5bb3 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
@@ -129,2 +129,2 @@ contains
-call f27 ()	! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-		! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } }
+call f27 () ! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
index 610693e9807..2fd8abd0dc7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
@@ -136,2 +136,2 @@ contains
-	  call f13 ()	! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-			! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } }
+  call f13 ()   ! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } }


[Patch][v2] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

2023-01-25 Thread Tobias Burnus

Hi Jakub, hi all,

updated patch included, i.e. avoiding 'count' for 'j' when a 'j.0' would
do (i.e. only local var without the different step calculation). I also
now reject if there is a non-unit step on the loop using an outer var.

Eventually still to be done: replace the 'sorry' by working code, i.e.
implement the suggestions to handle some/all non-unit iteration steps as
proposed in this thread.

On 20.01.23 18:39, Jakub Jelinek wrote:

I think instead of non-unity etc. it is better to talk about constant
step 1 or -1.


I concur.



The actual problem with non-simple loops for non-rectangular loops is
both in case it is an inner loop which uses some outer loop's iterator,
or if it is outer loop whose iterator is used, both of those cases
will not be handled properly.


I have now added a check for the other case as well.

Just to confirm, the following is fine, isn't it?

!$omp simd collapse(4)
do i = 1, 10, 2
  do outer_var = 1, 10  ! step = + 1
do j = 1, 10, 2
  do inner_var = 1, outer_var  ! step = 1

i.e. both the inner_var and outer_var have 'step = 1',
even if other loops in the 'collapse' have step != 1.
I think it should be fine.

OK mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

This patch ensures that loop bounds depending on outer loop vars use the
proper TREE_VEC format. It additionally gives a sorry if such an outer
var has a non-one/non-minus-one increment as currently a count variable
is used in this case (see PR).

Finally, it avoids 'count' and just uses a local loop variable if the
step increment is +/-1.

	PR fortran/107424

gcc/fortran/ChangeLog:

	* trans-openmp.cc (struct dovar_init_d): Add 'sym' and
	'non_unit_incr' members.
	(gfc_nonrect_loop_expr): New.
	(gfc_trans_omp_do): Call it; use normal loop bounds
	for unit stride - and only create local loop var.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-3.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-4.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Update dg-note.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise.

 gcc/fortran/trans-openmp.cc| 238 ++--
 .../goacc/privatization-1-compute-loop.f90 |   6 +-
 .../goacc/privatization-1-routine_gang-loop.f90|   3 +-
 .../libgomp.fortran/non-rectangular-loop-1.f90 | 637 +
 .../libgomp.fortran/non-rectangular-loop-1a.f90| 374 
 .../libgomp.fortran/non-rectangular-loop-2.f90 | 243 
 .../libgomp.fortran/non-rectangular-loop-3.f90 | 186 ++
 .../libgomp.fortran/non-rectangular-loop-4.f90 | 188 ++
 .../libgomp.fortran/non-rectangular-loop-5.f90 |  28 +
 9 files changed, 1854 insertions(+), 49 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 87213de0918..ccee9e16648 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5116,10 +5116,135 @@ gfc_trans_omp_critical (gfc_code *code)
 }
 
 typedef struct dovar_init_d {
+  gfc_symbol *sym;
   tree var;
   tree init;
+  bool non_unit_iter;
 } dovar_init;
 
+static bool
+gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n,
+		   gfc_code *code, gfc_expr *expr, vec *inits,
+		   int simple, gfc_expr *curr_loop_var)
+{
+  int i;
+  for (i = 0; i < loop_n; i++)
+{
+  gcc_assert (code->ext.iterator->var->expr_type == EXPR_VARIABLE);
+  if (gfc_find_sym_in_expr (code->ext.iterator->var->symtree->n.sym, expr))
+	break;
+  code = code->block->next;
+}
+  if (i >= loop_n)
+return false;
+
+  /* Canonic format: TREE_VEC with [var, multiplier, offset].  */
+  gfc_symbol *var = code->ext.iterator->var->symtree->n.sym;
+
+  tree tree_var = NULL_TREE;
+  tree a1 = integer_one_node;
+  tree a2 = integer_zero_node;
+
+  if (!simple)
+{
+  /* FIXME: Handle non-unit iter steps, cf. PR fortran/107424.  */
+  sorry_at (gfc_get_location (_loop_var->where),
+		"non-rectangular loop nest with step other than constant 1 "
+		"or -1 for %qs", curr_loop_var->symtree->n.sym->name);
+  return false;
+}
+
+  dovar_init *di;
+  unsigned ix;
+  FOR_EACH_VEC_ELT (*inits, ix, di)
+if (di->sym == var && !di->non_unit_iter)
+  {
+	tree_var = di->init;
+	gcc_assert (DECL_P (tree_var));
+	break;
+  }
+else if (di->sym == var)
+  {
+	/* FIXME: 

[Patch] OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]

2023-01-24 Thread Tobias Burnus

I stumbled over a new FAIL (regression) in sollve_vv today, which was due to an
odd corner case (see commit log for a description).

The mentioned in-scan error is tested for in gomp/loop-2.f90 ("'inscan' 
REDUCTION
clause on construct other than DO, SIMD, DO SIMD, PARALLEL DO, PARALLEL DO 
SIMD").

I hope that this patch covers all cases and no other surprises exist...

OK for mainline?

 * * *

The ICE is new in GCC 13 due to the duplicate diagnostic (cf. PR); the original 
issue
existed before but seemingly did not affect the code, at least the sollve_vv 
testcase
passed before.

Still, it could be backported to GCC 12. (Fortran '!$omp loop' support was 
added with r12-1206.)
Thoughts?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]

For 'parallel', loop-iteration variables are marked are marked as 'private',
unless they either appear in an omp do/simd loop or an data-sharing clause
already exists for those on 'parallel'. 'omp loop' wasn't handled, leading
to (potentially) multiple data-sharing clauses in gfc_resolve_do_iterator
as omp_current_ctx pointed to the 'parallel' directive, ignoring the
in-betwen 'loop' directive.

The latter lead to a bogus diagnostic - or rather an ICE as the source
location var contained only '\0'.

gcc/fortran/ChangeLog:

	PR fortran/108512
	* openmp.cc (gfc_resolve_omp_do_blocks): Don't check 'inscan'
	restrictions for loop as rejected elsewhere.
	(gfc_resolve_do_iterator): Set a source location for added
	'private'-clause arguments.
	* resolve.cc (gfc_resolve_code): Call gfc_resolve_omp_do_blocks
	also for EXEC_OMP_LOOP.

gcc/testsuite/ChangeLog:

	PR fortran/108512
	* gfortran.dg/gomp/loop-5.f90: New test.

 gcc/fortran/openmp.cc |  5 +-
 gcc/fortran/resolve.cc|  1 +
 gcc/testsuite/gfortran.dg/gomp/loop-5.f90 | 84 +++
 3 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index cc1eab90b8c..7673a52249f 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -9056,7 +9056,9 @@ gfc_resolve_omp_do_blocks (gfc_code *code, gfc_namespace *ns)
 	}
   if (i < omp_current_do_collapse || omp_current_do_collapse <= 0)
 	omp_current_do_collapse = 1;
-  if (code->ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN])
+  if (code->op == EXEC_OMP_LOOP)
+	;  /* Already rejected in resolve_omp_clauses.  */
+  else if (code->ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN])
 	{
 	  locus *loc
 	= >ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN]->where;
@@ -9224,6 +9226,7 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause)
 
   p = gfc_get_omp_namelist ();
   p->sym = sym;
+  p->where = omp_current_ctx->code->loc;
   p->next = omp_clauses->lists[OMP_LIST_PRIVATE];
   omp_clauses->lists[OMP_LIST_PRIVATE] = p;
 }
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 94213cd3cd4..bd2a749776d 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11950,6 +11950,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 	case EXEC_OMP_DISTRIBUTE_SIMD:
 	case EXEC_OMP_DO:
 	case EXEC_OMP_DO_SIMD:
+	case EXEC_OMP_LOOP:
 	case EXEC_OMP_SIMD:
 	case EXEC_OMP_TARGET_SIMD:
 	  gfc_resolve_omp_do_blocks (code, ns);
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-5.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-5.f90
new file mode 100644
index 000..1948e782653
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-5.f90
@@ -0,0 +1,84 @@
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/108512
+
+! The problem was that the context wasn't reset for the 'LOOP'
+! such that the clauses of the loops weren't seen when adding
+! PRIVATE clauses.
+!
+! In the following, only the loop variable of the non-OpenMP loop
+! in 'subroutine four' should get a front-end addded PRIVATE clause
+
+implicit none
+integer :: x, a(10), b(10), n
+n = 10
+a = -42
+b = [(2*x, x=1,10)]
+
+! { dg-final { scan-tree-dump-times "#pragma omp target map\\(tofrom:a\\) map\\(tofrom:b\\) map\\(tofrom:x\\)\[\r\n\]" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp parallel\[\r\n\]" 2 "original" } }
+!  ^- shows up twice; checked only here.
+! { dg-final { scan-tree-dump-times "#pragma omp loop lastprivate\\(x\\)\[\r\n\]" 1 "original" } }
+
+!$omp target parallel map(tofrom: a, b, x)
+!$omp loop lastprivate(x)
+DO x = 1, n
+  a(x) = a(x) + b(x)
+END DO
+!$omp end loop
+!$omp end target parallel
+if (x /= 11) error stop
+if (any (a /= [(2*x - 42, x=1,10)])) error stop
+call two()
+call three()
+

Re: [Patch] install.texi: Bump newlib version for nvptx + gcn

2023-01-23 Thread Tobias Burnus



On 22.01.23 02:45, Gerald Pfeifer wrote:

Maybe, but the question is what to use? The project's webpage has on the
first page: "patch submissions to Newlib" and "automate the testing of
newlib".

I also dug into the newlib web page and other sources and - while my
personal preference slightly leans towards Newlib - believe newlib is
more established overall.

For the web pages, it's clearer than for our *.texi ones you dug into:

   ~/src/wwwdocs/htdocs> grep -r newlib . | wc -l
   15
   ~/src/wwwdocs/htdocs> grep -r Newlib . | wc -l
   3


You need to be careful with those counts as there is not only 'the [nN]ewlib 
library'
but also flags/configure arguments etc:

gcc/doc/install.texi:@item --with-newlib

gcc/doc/install.texi-@item --with-nds32-lib=@var{library}
gcc/doc/install.texi:Currently, the valid @var{library} is @samp{newlib} or 
@samp{mculib}.

gcc/doc/install.texi:to nvptx-newlib's @file{newlib} directory to the directory 
containing
gcc/doc/install.texi:@option{--enable-newlib-io-long-long} options when 
configuring.

gcc/doc/invoke.texi:@samp{--enable-newlib-nano-formatted-io}.
gcc/doc/invoke.texi:@item -mnewlib
gcc/doc/invoke.texi:@opindex mnewlib

(and a few more).

In the libstdc++-v3/doc/xml/, there are two 'newlib' and one 'Newlib'
(plus a bunch of newlib as filename/argument/config option).

Still, I concur that 'newlib' is still used a bit more often than 'Newlib'.

 * * *

In any case, I concur that it would be nice to unify .texi/.xml and diagnostic
output (twice in config/or1k/elf.opt) - and likewise the wwwdocs pages.
(That elf.opt file has twice 'newlib' and once 'Newlib'.)

-> adds this to the to-do list.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update

2023-01-23 Thread Tobias Burnus

Now committed with the suggestions taken into account.

That is: for non-rect loop-nest support, add 'some' / set back to
partial. I also changed the already-in-GCC-11 wording as it was a bit
unclear to which word/topic the "which" in the original patch referred
to - and the "some" made it even worse.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit a18af43b161b6ff4ea6e3aaf08dd72cbacb53a89
Author: Tobias Burnus 
Date:   Mon Jan 23 09:55:18 2023 +0100

OpenMP: Update gcc-13/changes + projects/gomp

* htdocs/gcc-13/changes.html: Improve wording; mention nvptx reverse
  offload; add 'some' to Fortran non-rect-loop support.
* htdocs/projects/gomp/index.html: Split clause/directive entry
  for 'allocate' and mark the clause variant as fully implemented.
  Set Fortran non-rect-loop support back to partial.
---
 htdocs/gcc-13/changes.html  | 19 +--
 htdocs/projects/gomp/index.html | 13 +
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index ba42170c..6cd5dd64 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -59,12 +59,19 @@ a work-in-progress.
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
 
-  Reverse offload is now supported and the all clauses to the
-  requires directive are now accepted. However, the
-  requires_offload, unified_address
-  and unified_shared_memory clauses imply the initial
-  device (= the host) as the only available device. Fortran now
-  supports non-rectangular loop nests, which were added for C/C++ in GCC 11.
+  Reverse offload is now supported with nvptx devices. Additionally, the
+  requires handling has been improved and all clauses are
+  now accepted. If a requirement cannot be fulfilled for an accessible
+  device, this device is excluded from the list of available devices. This
+  may imply that the only device left is the host (the initial device).
+  In particular, requires_offload is currently unsupported on
+  AMD GCN devices while unified_address and
+  unified_shared_memory are unsupported by all non-host
+  devices.
+
+
+  OpenMP 5.0: Fortran now supports some non-rectangular loop nests; for
+  C/C++, the support was added in GCC 11.
 
 
   The following OpenMP 5.1 features have been added: the
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 19ff3c7d..17cf1ad9 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -372,8 +372,8 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Non-rectangular loop nests
-GCC11GCC13
-C/C++Fortran
+GCC11GCC13
+C/C++ (full)Fortran (partial)
   
   
 Nested-parallel changes to max-active-levels-var ICV
@@ -547,9 +547,14 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-align clause/modifier in allocate directive/clause and allocator directive
+align clause in allocate directive
+No
+
+  
+  
+align modifier in allocate clause
 GCC12
-C/C++ on clause only
+
   
   
 thread_limit clause to target construct


[committed] libgomp.texi: Impl. status - non-rect loop nest only partial

2023-01-23 Thread Tobias Burnus

As discussed in the thread
  Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update
  https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610324.html
in https://gcc.gnu.org/PR107424 and the thread starting at
  OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]
  https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610240.html
the Fortrans support is still incomplete.

As suggested in the wwwdocs thread (see link), the implementation status
for Fortran needs to be 'P'.

(Short version of the issue: currently, there are many issues with
non-rectangular loop nests; the nearly ready patches will fix those
for stride == -1 and 1. Ideas exist for other strides, but this may
take a few more days to get resolved.)

Committed as r13-5287-g20552407ae11b61fccb46b3e96a8814e790254e7

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 20552407ae11b61fccb46b3e96a8814e790254e7
Author: Tobias Burnus 
Date:   Mon Jan 23 09:40:41 2023 +0100

libgomp.texi: Impl. status - non-rect loop nest only partial

libgomp/
* libgomp.texi (OpenMP 5.0): Set non-rectangular
loop nest back to 'P' as Fortran support is incomplete.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 1267c2304a5..67a05111289 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -195,7 +195,7 @@ The OpenMP 4.5 specification is fully supported.
   @tab complete but no non-host devices provides @code{unified_address},
   @code{unified_shared_memory} or @code{reverse_offload}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
-@item Non-rectangular loop nests @tab Y @tab
+@item Non-rectangular loop nests @tab P @tab Full support for C/C++, partial for Fortran
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
 @item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
   constructs @tab Y @tab


Re: [Patch] install.texi: Bump newlib version for nvptx + gcn

2023-01-21 Thread Tobias Burnus

Hi Gerald,

On 21.01.23 12:58, Gerald Pfeifer wrote:

Is it maybe a little tough to bump the minimal requirement to something
only released yesterday? Or is this not an issue looking at the use cases?
(Genuine question. Maybe nothing to worry at all.)


On the technical side, the newer newlib version is not yet required. But
it looks as if it soon makes a lot of sense to have it:

For the AMGCN stack builtins, they currently expand to the same registers
and offset calculations as hard-coded in newlib (older version or if the
builtin is not available). – If the stack allocation is changed to
non-threadprivate, this will change the location. With the builtins, just
recompiling newlib (+libgomp) will work (API preserved but not ABI).
[Andrew to provide the stack patch; then me for the 2-line patch to enable
OpenMP's reverse offload.]

(Hen-egg problem in terms of compilation as newlib is compiled by GCC.
Probably only detectable by running it on the offload device and checking
whether it fails - not practical for a cross-compiler build.)


For AMDGCN's vectorization functions: Those can lead to a significant 
performance
advantage. I know that newlib only used some builtins if they are available.
I think AMDGCN will emit code using the new libm functions, which in turn
newlib only generates if GCC supports certain new builtins. (hen-egg problem,
if my assumptions are correct.)
[I think Kwok will provide this patch - he did implement the funcs in newlib.]

nvptx: Thomas' patch for libgfortran(*) effectively requires the newer newlib -
albeit one could imaging that there could be a configure check.

[(*) "nvptx, libgfortran: Switch out of "minimal" mode",
approved but awaiting approval of another patch)]

Thus:

As nvptx/amdgcn is (mostly) about offloading code, newlib is compiled usually 
alongside
GCC (e.g. in SUSE, Debian/Ubuntu, ...); additionally, there is static linking 
such that
mixing old vs. new libraries is less likely. Hence, requiring the newest 
version of newlib
together with the newest compiler shouldn't be a problem in my opinion.

And the if documented now, it cannot be forgotten by the time the pending 
patches get
committed... ;-)


And, this predates your patch, in one instance we refer to Newlib (upper
case9, in the other to newlib (lower case). Would it make sense to
converge to one?


Maybe, but the question is what to use? The project's webpage has on the first 
page:
"patch submissions to Newlib" and "automate the testing of newlib".

As uppercase, we have:

gcc/d/implement-d.texi:@code{CRuntime_Newlib} is set when Newlib is the default 
C library.
gcc/doc/install.texi:Use Newlib (4.3.0 or newer).
gcc/doc/invoke.texi:This option requires Newlib Nano IO, so GCC must be 
configured with
gcc/doc/invoke.texi:Newlib.
gcc/doc/invoke.texi:Specify the PRU MCU variant to use.  Check Newlib for the 
exact list of
gcc/doc/sourcebuild.texi:Target supports Newlib.
gcc/doc/sourcebuild.texi:the code size of Newlib formatted I/O functions.

gcc/po/gcc.pot:"Newlib Nano IO."
(Add a missing "Requires " to complete the sentence.)

and as lowercase:

gcc/doc/install.texi:Specifies that @samp{newlib} is
gcc/doc/install.texi:@samp{newlib}.
gcc/doc/install.texi:RTEMS configurations, which currently use newlib.  The 
option is
denotes a configure argument.)
gcc/doc/invoke.texi:newlib board library linking.  The default is 
@code{or1ksim}.
gcc/doc/invoke.texi:select linker and preprocessor options for use with newlib.
gcc/doc/sourcebuild.texi:@item newlib

(Side remark: While some @sample{newlib} in install.texi refer to a value to
a configure argument, in the quote above it refers to the library itself.)

gcc/po/gcc.pot:msgid "Configure the newlib board specific runtime.  The default is 
or1ksim."
gcc/po/gcc.pot:"This used to select linker and preprocessor options for use with 
newlib."

libstdc++-v3/doc/xml/manual/configure.xml:  vintage (2.3 and newer), 'gnu' 
is automatically selected. On newlib-based
libstdc++-v3/doc/xml/manual/configure.xml:  systems 
('--with_newlib=yes') and OpenBSD, 'newlib' is
libstdc++-v3/doc/xml/manual/evolution.xml:A new clocale model for newlib is 
available.

Thoughts?

Thanks for the comments!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update

2023-01-21 Thread Tobias Burnus

On 21.01.23 13:48, Gerald Pfeifer wrote:

Just one question: Does "all clauses are now accepted" refer to
  - all (as in 100% of possible clauses), or
  - all (as in a special kind of clause)?


The former – besides the listed 'unified_shared_memory',
'unified_address' and 'reverse_offload' clauses, there are
'dynamic_allocators' and 'atomic_default_mem_order' which are handled in
the compiler front end (by being ignored (always fulfilled) and by used
its argument as default value, respectively).

Thanks for the review. (I will commit/update the patch later.)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] install.texi: Bump newlib version for nvptx + gcn

2023-01-21 Thread Tobias Burnus

A new newlib version has been realized yesterday: newlib-4.3.0 (yearly snapshot)
https://sourceware.org/pipermail/newlib/2023/020141.html
https://sourceware.org/ftp/newlib/index.html → 2023-01-20: 
newlib-4.3.0.20230120.tar.gz (8.8 MB)

For both nvptx and GCN, the new version is recommended - mostly because of 
upcoming changes
and not because GCC mainline already needs them currently. But soon it will, 
hence:

The attached patch bumps the minimal version instead of keeping the old version 
and only
recommending the newer one.

Comments? Suggestions? – If there are none, I intent to commit the patch as 
obvious.

Tobias

PS: For AMDGCN, the newlib uses (if available) some new builtins: one provided 
by GCC 13
but having the currently same value as the hard coded registers that get used 
if the builtin
s not available - to permit a change to non-private stack variables (required 
for reverse offload;
will require recompilation of newlib).
And to support vectorized math functions. (The gcn builtins still have to be 
added to GCC 13;
if the builtins aren't available, newlib won't use them - hence, also will 
later require a
rebuild with the newer newlib).

For nvptx, newlib added some features to permit building a non-minimal version 
of libgfortran,
which also permits I/O. The libgfortran changes have been approved but the GCC 
nvptx patches
still have to be reviewed (and would also require a pending nvptx-tools pull 
request).

BTW: The gcn vect math and the nvptx changes went into newlib in the last few 
days. Thus, if
you have use the 'git' version it won't have the changes, unless you updated at 
least yesterday.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
install.texi: Bump newlib version for nvptx + gcn

Before, newlib 3.2 was required for amdgcn and 3.1 for nvptx.
Now recommended is 4.3.0 which was just released on 2023-01-20.

While currently the old versions would work fine, upcoming GCC 
changes depend on a newer newlib. Thus, the minimal version is
bumped instead of just recommending the new version.

For GCN, the bump is in preparation for permitting non-threadlocal
stack variables and vectorized math functions - both scheduled for
GCC 13 and added to newlib in 4.3.0.

For nvptx, this includes an emulated clock (commit 6bb96d13a),
a calloc fix (5fca4e0f1) and changes to permit libgfortran to be
compiled with I/O support instead of only in minimal mode.
(Patch approved for GCC 13 but pending on a nvtpx patch.)

gcc/ChangeLog:

	* doc/install.texi (amdgcn, nvptx): Require newlib 4.3.0.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index ccc8d15fd08..b1861a6a437 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3855,7 +3855,7 @@ Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and cop
 @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and
 @file{bin/amdgcn-amdhsa-ranlib}.
 
-Use Newlib (3.2.0, or newer).
+Use Newlib (4.3.0 or newer).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.github.io,,ROCm Platform}, and use
@@ -4672,7 +4672,7 @@ Instead of GNU binutils, you will need to install
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
 
-You will need newlib 3.1.0 or later.  It can be
+You will need newlib 4.3.0 or later.  It can be
 automatically built together with GCC@.  For this, add a symbolic link
 to nvptx-newlib's @file{newlib} directory to the directory containing
 the GCC sources.


[Patch] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

2023-01-19 Thread Tobias Burnus

This is all about non-rectangular loop nests in OpenMP.

The attached patch depends on the obvious fix for https://gcc.gnu.org/PR108459,
which is together with a nice testcase in Jakub's WIP patch attached to the PR;
without, gfortran.dg/gomp/canonical-loop-1.f90 fails with an ICE (segfault).

My patch fixes part of the Fortran issues found. Namely, it ensures that a 
"regular"
non-rectangular loop nest actually works by passing the outer-loop-var, the 
multiplier
and offset in a TREE_VEC to the middle end. It additionally avoids pointlessly
creating a temporary variable for a VAR_DECL (main advantage: dump looks 
cleaner and
avoids some dependency analysis) - and likewise for 'step' given that 'step' was
evaluated before.

There is an additional issue - not quite addressed in this patch: There are 
cases
when a loop variable is replaced by another variable ('count') and then at the 
beginning
of the loop body, the original variable gets the value from the count variable. 
Obviously,
this no longer works with non-rectangular loop nests.
The 'count' appears in two cases: (a) when the iteration step is not 1 or -1 
and (b) if
the iteration variable is a pointer (scalar with allocatable, pointer, optional 
argument
or just a dummy argument; oddly, even if it has the value attribute).

There is pending work to be done in this case, as mentioned in comment 6 and 8 
of the PR.
This patch adds some 'sorry' messages for them. I hope and think that I have 
not missed
a case where 'count' is used which I did not catch, but I should have all or at 
least most.

OK for mainline, once the other patch has been committed?

Tobias

PS: I still need to verify that everything is fine, once the other patch has 
been committed.
A flaky mainboard on the laptop causes multiple random freezes per day, which 
makes testing
+ patch writing a bit harder. (At least the mainboard replacement is scheduled 
for tomorrow :-) )
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

This patch ensures that loop bounds depending on outer loop vars use the
proper TREE_VEC format. It additionally gives a sorry if such an outer
var has a non-one/non-minus-one increment as currently a count variable
is used in this case (see PR).

gcc/fortran/ChangeLog:

	PR fortran/107424
	* trans-openmp.cc (gfc_nonrect_loop_expr): New.
	(gfc_trans_omp_do): Call it for start/end loop bound
	for non-rectangular loop nests.

gcc/testsuite/

	PR fortran/107424
	* gfortran.dg/gomp/non-rectangular-loop-3.f90: New test.

libgomp/ChangeLog:

	PR fortran/107424
	* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.

 gcc/fortran/trans-openmp.cc| 167 +-
 .../gfortran.dg/gomp/non-rectangular-loop-3.f90|  85 +++
 .../libgomp.fortran/non-rectangular-loop-1.f90 | 637 +
 .../libgomp.fortran/non-rectangular-loop-1a.f90| 374 
 .../libgomp.fortran/non-rectangular-loop-2.f90 | 243 
 5 files changed, 1495 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 87213de0918..73376894316 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5120,6 +5120,136 @@ typedef struct dovar_init_d {
   tree init;
 } dovar_init;
 
+static bool
+gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n,
+		   gfc_code *code, gfc_expr *expr, vec *inits)
+{
+  int i;
+  for (i = 0; i < loop_n; i++)
+{
+  gcc_assert (code->ext.iterator->var->expr_type == EXPR_VARIABLE);
+  if (gfc_find_sym_in_expr (code->ext.iterator->var->symtree->n.sym, expr))
+	break;
+  code = code->block->next;
+}
+  if (i >= loop_n)
+return false;
+
+  /* Canonic format: TREE_VEC with [var, multiplier, offset].  */
+  gfc_symbol *var = code->ext.iterator->var->symtree->n.sym;
+
+  gfc_se se;
+  tree tree_var, a1, a2;
+  a1 = integer_one_node;
+  a2 = integer_zero_node;
+
+  gfc_init_se (, NULL);
+  gfc_conv_expr_lhs (, code->ext.iterator->var);
+  gfc_add_block_to_block (pblock, );
+  tree_var = se.expr;
+
+  {
+/* FIXME: Handle non-unity iterations, cf. PR fortran/107424.
+   The issue is that for those a 'count' variable is used.  */
+dovar_init *di;
+unsigned ix;
+tree t = tree_var;
+while (TREE_CODE (t) == INDIRECT_REF)
+  t = TREE_OPERAND (t, 0);
+FOR_EACH_VEC_ELT (*inits, ix, di)
+  {
+	tree t2 = di->var;
+	while (TREE_CODE (t2) == INDIRECT_REF)
+	  t2 = TREE_OPERAND (t2, 0);
+	if (t == t2)
+	  {
+	HOST_WIDE_INT intval;
+	if (gfc_extract_hwi 

[Patch] libfortran: Fix execute_command_line for Windows

2023-01-18 Thread Tobias Burnus

Reported by nightstrike, who also tested this patch.

On Windows, we call system() which works as described at
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170

Namely, it only fails with "-1" if the command interpreter
could not be started. Otherwise, it has the return value.
(Same on Linux.) On POSIX systems, 'sh' calls exit(127) or
_exit(127) if it cannot execute the program of the passed string,
as documented. Cf. https://www.unix.com/man-page/posix/3p/system/

Thus, the question is what happens on Windows. Our experiments, several
webpages (like stackoverflow) and the source code of WINE for cmd.exe indicate
that Windows returns 9009 in that case. See for instance
https://github.com/wine-mirror/wine/blob/master/programs/cmd/wcmdmain.c#L1262-L1269

Thus, we now do likewise. The code is for MINGW; Cygwin does not set that that
var and is likely to use return values closer to POSIX.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libfortran: Fix execute_command_line for Windows

On Windows, 'system' is called - that fails with -1 if the command
interpreter could not be started; on POSIX systems, if the child
process could not be started by the shell, exit(127)/_exit(127) is
called/returned. On Windows, cmd.exe (and also the PowerShell) return
errorlevel 9009.

libgfortran/ChangeLog:

	* intrinsics/execute_command_line.c (execute_command_line): On
	Windows, regard system()'s return value of 9009 as EXEC_INVALIDCOMMAND.

diff --git a/libgfortran/intrinsics/execute_command_line.c b/libgfortran/intrinsics/execute_command_line.c
index 305f067d973..0d1688400c2 100644
--- a/libgfortran/intrinsics/execute_command_line.c
+++ b/libgfortran/intrinsics/execute_command_line.c
@@ -142,10 +142,15 @@ execute_command_line (const char *command, bool wait, int *exitstat,
 #endif
   else if (res == 127 || res == 126
 #if defined(WEXITSTATUS) && defined(WIFEXITED)
 	   || (WIFEXITED(res) && WEXITSTATUS(res) == 127)
 	   || (WIFEXITED(res) && WEXITSTATUS(res) == 126)
+#endif
+#ifdef __MINGW32__
+		  /* cmd.exe sets the errorlevel to 9009,
+		 if the command could not be executed.  */
+		|| res == 9009
 #endif
 	   )
 	/* Shell return codes 126 and 127 mean that the command line could
 	   not be executed for various reasons.  */
 	set_cmdstat (cmdstat, EXEC_INVALIDCOMMAND);


Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update

2023-01-18 Thread Tobias Burnus

Hi Gerald,

On 16.01.23 23:16, Gerald Pfeifer wrote:

On Mon, 16 Jan 2023, Tobias Burnus wrote:

 requires_offload, unified_address
-  and unified_shared_memory clauses cause that the
-  only available device is the initial device (the host). Fortran now
+  and unified_shared_memory clauses imply the initial
+  device (= the host) as the only available device. Fortran now

I really stumble over the "as" – that sounds wrong and I fail to parse this
part. I think it should be "is".

happy to make this change. Or do you have an idea to reframe the
sentence (or paragraph) altogether?


Actually, I thinking about it again, the "imply" is also misleading – by
itself the restrictions do not imply that accelerators/GPUs are not
supported; that's only implied in GCC as the libgomp plugins for nvptx
and amdgcn don't handle it, yet.

How about the following? I put the other change into its own bullet
point to be less confusing, completely rewording the remaining item and
mention reverse offload support.

(Reverse offload is: While being in a target region ('omp target', i.e.
running code targeted for an offload device), it is possible to execute
a code on the host. — If there is no available non-host device, the
target region will run on the host (host fallback); in that case,
reverse offload is trivial (as host code calls host code).)


BTW: Before the release, further updates to changes.html are required.

Keep them coming! :-)


Actually, I think only one change was missing (looking at
libgomp/libgomp.texi), unless some more pending patches are accepted. –
I have now included that change in the attached patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Update gcc-13/changes + projects/gomp

* htdocs/gcc-13/changes.html: Improve wording; mention nvptx reverse
  offload.
* htdocs/projects/gomp/index.html: Split clause/directive entry
  for 'allocate' and mark the clause variant as fully implemented.

 htdocs/gcc-13/changes.html  | 19 +--
 htdocs/projects/gomp/index.html |  9 +++--
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index ca9cd2da..6deb445f 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -53,12 +53,19 @@ a work-in-progress.
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
 
-  Reverse offload is now supported and the all clauses to the
-  requires directive are now accepted. However, the
-  requires_offload, unified_address
-  and unified_shared_memory clauses imply the initial
-  device (= the host) as the only available device. Fortran now
-  supports non-rectangular loop nests, which were added for C/C++ in GCC 11.
+  Reverse offload is now supported with nvptx devices. Additionally, the
+  requires handling has been improved and all clauses are
+  now accepted. If a requirement cannot be fulfilled for an accessible
+  device, this device is excluded from the list of available devices. This
+  may imply that the only device left is the host (the initial device).
+  In particular, requires_offload is currently unsupported on
+  AMD GCN devices while unified_address and
+  unified_shared_memory are unsupported by all non-host
+  devices.
+
+
+  OpenMP 5.0: Fortran now supports non-rectangular loop nests, which were
+  added for C/C++ in GCC 11.
 
 
   The following OpenMP 5.1 features have been added: the
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 19ff3c7d..dc9c88e7 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -547,9 +547,14 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-align clause/modifier in allocate directive/clause and allocator directive
+align clause in allocate directive
+No
+
+  
+  
+align modifier in allocate clause
 GCC12
-C/C++ on clause only
+
   
   
 thread_limit clause to target construct


Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update

2023-01-16 Thread Tobias Burnus

Hi Gerald,

On 14.01.23 22:47, Gerald Pfeifer wrote:


I made a couple of incremental edits. See below for what I just pushed
(and please speak up if you see any issues).

commit 2f870cba58c81449beb618a9030824360a25


...


--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -54,10 +54,10 @@ a work-in-progress.


...


+  requires directive are now accepted. However, the
requires_offload, unified_address
-  and unified_shared_memory clauses cause that the
-  only available device is the initial device (the host). Fortran now
+  and unified_shared_memory clauses imply the initial
+  device (= the host) as the only available device. Fortran now


I really stumble over the "as" – that sounds wrong and I fail to parse this 
part.
I think it should be "is".

On the technical side, in principle, available devices are the host (aka "initial 
device") –
and all installed** (nonhost) devices – in our case nvptx and (amd)gcn GPUs.

However, when using 'requires', all installed devices which do not fulfill
the requirement(s) are removed from the list of available devices. In case of
'dynamic_allocators', all devices support it, in case of 'reverse_offload' all 
installed
amdgcn devices are filtered out and, for unified-shared memory,* neither nvptx 
nor
amdgcn support it – and are removed from the list – such that at the end, only
the host remains. (Hence, device code ('target regions') will run on the host
→ host fallback.)

BTW: Before the release, further updates to changes.html are required. – For 
instance,
as alluded in the previous paragraph, 'reverse offload' is (now) supported for 
nvptx.
(But not yet with amdgcn.)

Tobias

(*) There is support for unified-shared memory for both nvptx and gcn,
but the existing patches either have to be reviewed or to be revised.

(**) I coined the term 'installed device'. OpenMP since TR11 contains some
definitions for 'available devices' – which consists of the union of supported
and accessible devices (possibly after sorting and further filtering). Namely:

accessible devices – The host device and all non-host devices accessible for 
execution.

supported devices – The host device and all non-host devices supported by the
implementation for execution of target code for which the device-related 
requirements
of the requires directive are fulfilled.

The available-devices-var is in turn by default "*" – where "* expands to all 
accessible
and supported devices". (The device list can be further filtered and sorted via
the environment variable OMP_AVAILABLE_DEVICES.)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107706] (was: [PR107424])

2023-01-12 Thread Tobias Burnus

First, I messed up the PR number – it should be PR107706.

On 12.01.23 11:39, Jakub Jelinek wrote:

On Thu, Jan 12, 2023 at 11:22:40AM +0100, Tobias Burnus wrote:

Rather obvious fix for that ICE.

Comments? If there are none, I will commit it later as obvious.

I think the spec should be clarified, unlike clauses like if, novariants,
nocontext, indirect, final clause operands where we specify the argument
to be expression of logical type and glossary term says that OpenMP logical
expression [...] But for the holds clause, all we say is that holds clause
isn't inarguable and [...] that the listed expression evaluates to true in
the assumption scope. [...]
so I think making it clear that holds argument is expression of logical type
would be useful.


Actually, the spec does have (internally) hold-expr = "OpenMP logical
expression" in a JSON file but that does not show up in the generated
PDF. I have now filed an OpenMP spec issue for it (#3453).


That said, the patch is ok, a rank > 1 expression can't be considered to
evaluate to true...


Thanks! Committed as r13-5118-g2ce55247a8bf32985a96ed63a7a92d36746723dc
(with the fixed PR number).

Thanks.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107424]

2023-01-12 Thread Tobias Burnus

Rather obvious fix for that ICE.

Comments? If there are none, I will commit it later as obvious.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107424]

gcc/fortran/ChangeLog:

	PR fortran/107424
	* openmp.cc (gfc_resolve_omp_assumptions): Reject nonscalars.

gcc/testsuite/ChangeLog:

	PR fortran/107424
	* gfortran.dg/gomp/assume-2.f90: Update dg-error.
	* gfortran.dg/gomp/assumes-2.f90: Likewise.
	* gfortran.dg/gomp/assume-5.f90: New test.

 gcc/fortran/openmp.cc|  8 +---
 gcc/testsuite/gfortran.dg/gomp/assume-2.f90  |  2 +-
 gcc/testsuite/gfortran.dg/gomp/assume-5.f90  | 20 
 gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 |  2 +-
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index b71ee467c01..916daeb1aa5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -6911,9 +6911,11 @@ void
 gfc_resolve_omp_assumptions (gfc_omp_assumptions *assume)
 {
   for (gfc_expr_list *el = assume->holds; el; el = el->next)
-if (!gfc_resolve_expr (el->expr) || el->expr->ts.type != BT_LOGICAL)
-	gfc_error ("HOLDS expression at %L must be a logical expression",
-		   >expr->where);
+if (!gfc_resolve_expr (el->expr)
+	|| el->expr->ts.type != BT_LOGICAL
+	|| el->expr->rank != 0)
+  gfc_error ("HOLDS expression at %L must be a scalar logical expression",
+		 >expr->where);
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/gomp/assume-2.f90 b/gcc/testsuite/gfortran.dg/gomp/assume-2.f90
index ca3e04dfe95..dc306a9088a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/assume-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/assume-2.f90
@@ -22,6 +22,6 @@ subroutine foo (i, a)
   end if
 !  !$omp end assume  - silence: 'Unexpected !$OMP END ASSUME statement'
 
-  !$omp assume holds (1.0)  ! { dg-error "HOLDS expression at .1. must be a logical expression" }
+  !$omp assume holds (1.0)  ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" }
   !$omp end assume
 end
diff --git a/gcc/testsuite/gfortran.dg/gomp/assume-5.f90 b/gcc/testsuite/gfortran.dg/gomp/assume-5.f90
new file mode 100644
index 000..5c6c00750dd
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/assume-5.f90
@@ -0,0 +1,20 @@
+! PR fortran/107424
+!
+! Contributed by G. Steinmetz
+!
+
+integer function f(i)
+   implicit none
+   !$omp assumes holds(i < g())  ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" }
+   integer, value :: i
+
+   !$omp assume holds(i < g())  ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" }
+   block
+   end block
+   f = 3
+contains
+   function g()
+  integer :: g(2)
+  g = 4
+   end
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 b/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90
index 729c9737a1c..c8719a86a94 100644
--- a/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90
@@ -4,7 +4,7 @@ module m
   !$omp assumes contains(target) holds(x > 0.0)
   !$omp assumes absent(target)
   !$omp assumes holds(0.0)
-! { dg-error "HOLDS expression at .1. must be a logical expression" "" { target *-*-* } .-1 }
+! { dg-error "HOLDS expression at .1. must be a scalar logical expression" "" { target *-*-* } .-1 }
 end module
 
 module m2


Re: [PATCH] fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]

2023-01-11 Thread Tobias Burnus

Hi,

On 11.01.23 10:18, Jakub Jelinek via Gcc-patches wrote:

As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the
Fortran FE is wrong since r0-100026-gb64fca63690ad [...]
I went through all other changes from that commit and found that
__builtin_sincos{,f,l} got broken as well, [...]

The following patch fixes that, plus some formatting issues around
the spots I've changed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK. Thanks for the patch!

Tobias


2023-01-11  Jakub Jelinek  

  PR fortran/108349
  * f95-lang.cc (gfc_init_builtin_function): Fix up function types
  for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}.  Formatting fixes.

--- gcc/fortran/f95-lang.cc.jj2022-11-15 22:57:18.247210671 +0100
+++ gcc/fortran/f95-lang.cc   2023-01-10 11:31:43.787266346 +0100
@@ -714,31 +714,34 @@ gfc_init_builtin_functions (void)
  float_type_node, NULL_TREE);

func_cdouble_double = build_function_type_list (double_type_node,
-  complex_double_type_node,
-  NULL_TREE);
+   complex_double_type_node,
+   NULL_TREE);

func_double_cdouble = build_function_type_list (complex_double_type_node,
-  double_type_node, NULL_TREE);
+   double_type_node, NULL_TREE);

-  func_clongdouble_longdouble =
-build_function_type_list (long_double_type_node,
-  complex_long_double_type_node, NULL_TREE);
-
-  func_longdouble_clongdouble =
-build_function_type_list (complex_long_double_type_node,
-  long_double_type_node, NULL_TREE);
+  func_clongdouble_longdouble
+= build_function_type_list (long_double_type_node,
+ complex_long_double_type_node, NULL_TREE);
+
+  func_longdouble_clongdouble
+= build_function_type_list (complex_long_double_type_node,
+ long_double_type_node, NULL_TREE);

ptype = build_pointer_type (float_type_node);
-  func_float_floatp_floatp =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_float_floatp_floatp
+= build_function_type_list (void_type_node, float_type_node, ptype, ptype,
+ NULL_TREE);

ptype = build_pointer_type (double_type_node);
-  func_double_doublep_doublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_double_doublep_doublep
+= build_function_type_list (void_type_node, double_type_node, ptype,
+ ptype, NULL_TREE);

ptype = build_pointer_type (long_double_type_node);
-  func_longdouble_longdoublep_longdoublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_longdouble_longdoublep_longdoublep
+= build_function_type_list (void_type_node, long_double_type_node, ptype,
+ ptype, NULL_TREE);

  /* Non-math builtins are defined manually, so they're not included here.  */
  #define OTHER_BUILTIN(ID,NAME,TYPE,CONST)
@@ -992,9 +995,8 @@ gfc_init_builtin_functions (void)
"calloc", ATTR_NOTHROW_LEAF_MALLOC_LIST);
DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_CALLOC)) = 1;

-  ftype = build_function_type_list (pvoid_type_node,
-size_type_node, pvoid_type_node,
-NULL_TREE);
+  ftype = build_function_type_list (pvoid_type_node, pvoid_type_node,
+ size_type_node, NULL_TREE);
gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
"realloc", ATTR_NOTHROW_LEAF_LIST);


  Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


OpenMP Patch Ping

2023-01-10 Thread Tobias Burnus

Hi all, hello Jakub,

Below is the updated list to last ping,
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607178.html

NOTE to the list below: I have stopped checking older patches. I know
some more are pending review, others need to be revised. I will re-check,
once the below listed patches have been reviewed. Cf. old list.

Thanks for the reviews done in between the last ping and now!

 * * *

Small patches
=

* [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html
  Tue Dec 13 16:38:22 GMT 2022

* [Patch] OpenMP: Parse align clause in allocate directive in C/C++
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html
  Tue Dec 13 17:44:27 GMT 2022

* Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: 
Handle OpenMP's reverse offloads)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608245.html
  Thu Nov 24 12:01:04 GMT 2022

(Side note: wwwdocs also needs to be updated for the latter patch and
some other patches done in the meanwhile.)


Fortran allocat(e,ors) prep patch
=

* [Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive 
(was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html
  Wed Dec 21 15:51:25 GMT 2022

(Remark: While written from scratch, it is kind of a follow-up to Abid's patch
   [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 
5.0)
you/Jakub reviewed on Tue Oct 11 12:13:14 GMT 2022, i.e.
 https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html
- For the actual implementation of 'allocators', we still have to solve the 
issues
  raised in the review for '[PATCH 2/5] [gfortran] Translate allocate directive 
(OpenMP 5.0).'.
  at https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603279.html (and 
earlier in the thread);
  implementing 'omp allocate' (Fortran/C/C++) seems to be easier but no one has 
started implementing
  it so far - only parsing support exists.
- The USM patches on semi-USM system run into a similar issue as 'allocators' 
and for it, some
  ME omp_allocate is added.)


Mapping related patches
===
(Complex but GCC needs a revision badly as it fixing several bugs and missing 
functionality.)

* Complete patch set was just re-submitted by Julian, overiew patch is
  [PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" 
support
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031
  Fri Dec 23 12:12:53 GMT 2022
* Note: For 10/11 of the set, there was a follow up this Monday
  [PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C
  https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html

[As it relates to one patch in the series:
  '[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr'
That's mine, needs to be updated (WIP) and fixes array 
descriptor/alloc-string-length var
issues, where descriptor/string length may need to be handled explicitly on 
data entering map,
i.e. string lengths/allocator may require 'to:' instead of 'alloc:' - and on 
data exit mapping,
the current code might add a bogus 'alloc:'. - Idea is to handle this explicitly
in fortran/trans-openmp.cc instead of auto-adding it in the ME.
Status: WIP - removed in ME but not all cases are handled yet in FE.)


Fortran deep mapping (allocatable components)

(Old patch of March 2022, but first part now properly but belated submitted - 
today):
[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran 
deep mapping of DT
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609637.html

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT

2023-01-10 Thread Tobias Burnus

This patches is the ME part to support OpenMP 5.0's deep-mapping
feature, i.e. mapping allocatable components of Fortran's derived types
automatically. [Not the lang hooks but allocatate-array part will probably
also be useful when later adding 'iterator'-modifier support to the
'map'/'to'/'from' clauses.]

This is a belated real submission of the patch sent in March 2022,
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html
(with FE fixes at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593562.html
  (note to self: Bernhard did sent some comment fixes off list)
+ https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593704.html )
+ ME fix for OpenACC at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603906.html
[which is in the attach patch]

As written, attached is the ME part. Below is a description how
it is supposed to get used; the patch links above show how it looks
in the real-code FE.


==
BACKGROUND
==

Fortran permits

type t
  integer, allocatable :: x, y(:)
end type t
type t2
  type(t2), allocatable :: previous_stack  ! Not valid in OMP 5.0
  integer, allocatable :: a
  type(t) :: b, c(:)
end type t2
type(t2) :: var1, var2(:)

!$omp target enter data(var1, var2)

Where all allocatable components need to be mapped alongside. The number of
mappings is only known at runtime, e.g. for 'var2' - the array size is only
known at runtime and then each allocatable component of each element of
'var2' needs to be mapped - both those can contain allocatable components as
well, which have to be mapped - but of course only if the parent component
is actually allocated.

 * * *

The current code puts 'kinds' with const values into an array, 'sizes' in
a fixed-size stack array (either with const or dynamic values) and 'addrs'
is a struct.

To support deep mapping, those all have to be dynamic; hence, the arrays
'sizes' and 'kinds' are turned into pointers - and the 'struct' gets a
tailing variable-size array, which is then filled with the dynamic content.

For this purpose, three lang hooks are added - all are called rather late,
i.e. during omp-low.cc, such that all previous operations (implicit mapping,
explicit mapping, OpenMP mapper) are already done.

* First one to check whether there is any allocatable component for a map-clause
  element (explicitly or implicitly added). If not, the current code is used.
  Otherwise, it uses dynamically allocated arrays
(Side note: As the size is now only known at runtime, TREE_VEC has now another
 element - the array size - hence the change to expand_omp_target, before it
 was known statically from the type.)

* Second hook to actually count how many allocations are done, required for
  the allocation.

* Third hook to actually fill the arrays.


Comments? Remarks?

Tobias

PS: There are two things to watch out in the future:
- 'mapper': I think it should work when the mapper is present as it comes rather
  late in the flow, but I have not checked with Julian's patches (pending 
review).
- Order: the dynamic items are added last to 'addrs' to permit keeping the 
'struct'
  type. I think that's fine for allocatable components as they are added rather 
late
  and accessing them via 'is_device_ptr' is not possible.
  But there might be some issues with 'interator' in future; something to watch 
out.
  If so, we may need to partially or fully give up on putting all others 
mappings stillinto
  the struct.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT

This patch adds middle end support for mapping Fortran derived-types with
allocatable components. If those are present, the kinds/sizes arrays will be
allocated at run time and the addrs struct gets an variable-sized array at
the end. The newly added hooks are:
  * lhd_omp_deep_mapping_p: If true, use the new code.
  * lhd_omp_deep_mapping_cnt: Count the elements, needed for allocation.
  * lhd_omp_deep_mapping: Fill the allocated arrays.

gcc/ChangeLog:

	* langhooks-def.h (lhd_omp_deep_mapping_p,
	lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
	(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
	LANG_HOOKS_OMP_DEEP_MAPPING): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
	lhd_omp_deep_mapping): New stubs.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks
	* omp-expand.cc (expand_omp_target): Handle dynamic-size
	addr/sizes/kinds arrays.
	* omp-low.cc (build_sender_ref, fixup_child_record_type,
	scan_sharing_clauses, lower_omp_target): Update to handle
	new hooks and dynamic-size addr/sizes/kinds arrays.

 gcc/langhooks-def.h |  10 +++
 gcc/langhooks.cc|  24 ++
 

Re: [PATCH] [OpenMP] GC unused SIMD clones

2023-01-02 Thread Tobias Burnus

On 25.11.22 03:13, Sandra Loosemore wrote:

This patch is a followup to my not-yet-reviewed patch
[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare
target"


That patch got reviewed and went into mainline on Nov 15, 2022 as
https://gcc.gnu.org/r13-4309-g309e2d95e3b930c6f15c8a5346b913158404c76d



In comments on a previous iteration of that patch, I was asked to do
something to delete unused SIMD clones to avoid code bloat; this is it.

I've implemented something like a simple mark-and-sweep algorithm.
Clones that are used are marked at the point where the call is
generated in the vectorizer.  The loop that iterates over functions to
apply the passes after IPA is modified to defer processing of unmarked
clones, and anything left over is deleted.



Jakub referred to Honza for the review, who wrote yesterday off list (to
me and Sandra):


I am really sorry for taking so long time.  It was busy month for me
and I was not very keen about the idea, since we had such logic
implemented many years ago but removed it to be able to determine
functions to be output early and optimize code layout.

I see that this is not possible with current organization where
vectorization is run late, so I guess it does make sense to do what you
are doing.

Patch is OK,
Honza


Thanks for the review! (And to Sandra: thanks for the patch.)

I leave it to Sandra to commit her patch and only want to update the
gcc-patches@ email. However. I think we can expect a commit tomorrow.
(Today is a holiday at her place - as new year's day fell on a Sunday.)

Thanks and happy new year!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive (was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive)

2022-12-21 Thread Tobias Burnus

Related pending (simple) patches - aka *Patch Ping*:

* [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html

* [Patch] OpenMP: Parse align clause in allocate directive in C/C++
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html

On 14.12.22 11:47, Tobias Burnus wrote:


This patch adds parsing/argument-checking support for
  '!$omp allocators allocate([align(int),allocator(a) :] list)'


This follow-up patch additionally adds parsing support for both
declarative and allocate-stmt-associated '!$omp allocate' directives –
and replaces my previous patch.

OK for mainline?

 * * *

The code requires in line with OpenMP 5.1 that an executable statement
comes before an '!$omp allocate' that is associated with a Fortran
ALLOCATE stmt, which is diagnosed.

Note: There is a spec change/regression related to permitting structure
elements; while OpenMP 5.0/5.1 did permit them in the
allocate-stmt-associated "!$omp allocate", OpenMP 5.2 stopped doing –
and '!$omp allocators' never permitted it. — For allocate that's seems
to be the accidental result from "permitted unless stated otherwise" to
"rejected unless stated otherwise". For 'allocators', it is the result
of the original 'allocate' clause which should have been extended for
'allocators' - or should not.

In any case, that's tracked now in OpenMP's spec issue #3437.

Thoughts? – The code rejects var%comp and var(1)%comp etc. for now –
besides the unclear spec status, I admittedly did this also to make
checking easier (like for duplicated entries, entry same as in ALLOCATE
except for tailing array spec etc.).

 * * *

This patch replaced both my previous patch in this thread and also
Abid's patch


"[PATCH 1/5] [gfortran] Add parsing support for allocate directive
(OpenMP 5.0)."
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html


In his patch set, later patches actually add allocater support for
allocatables/pointers, only – but there issues with regards to the used
allocator (see patches + patch review).

As my attached patch raises a sorry, it neither addresses that issue nor
is it affected by that issue.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Add parsing support for allocators/allocate directive

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Update allocator, fix
	align dump.
	(show_omp_node, show_code_node): Handle EXEC_OMP_ALLOCATE.
	* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE and ..._EXEC.
	(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
	(struct gfc_omp_namelist): Add 'allocator' to 'u2' union.
	(struct gfc_namespace): Add omp_allocate.
	(gfc_resolve_omp_allocate): New.
	* match.cc (gfc_free_omp_namelist): Free 'u2.allocator'.
	* match.h (gfc_match_omp_allocate, gfc_match_omp_allocators): New.
	* openmp.cc (gfc_omp_directives): Uncomment allocate/allocators.
	(gfc_match_omp_variable_list): Add bool arg for
	rejecting listening common-block vars separately.
	(gfc_match_omp_clauses): Update for u2.allocators.
	(OMP_ALLOCATORS_CLAUSES, gfc_match_omp_allocate,
	gfc_match_omp_allocators, is_predefined_allocator,
	gfc_resolve_omp_allocate): New.
	(resolve_omp_clauses): Update 'allocate' clause checks.
	(omp_code_to_statement, gfc_resolve_omp_directive): Handle
	OMP ALLOCATE/ALLOCATORS.
	* parse.cc (in_exec_part): New global var.
	(check_omp_allocate_stmt, parse_openmp_allocate_block): New.
	(decode_omp_directive, case_exec_markers, case_omp_decl,
	gfc_ascii_statement, parse_omp_structured_block): Handle
	OMP allocate/allocators.
	(verify_st_order, parse_executable): Set in_exec_part.
	* resolve.cc (gfc_resolve_blocks, resolve_codes): Handle
	allocate/allocators.
	* st.cc (gfc_free_statement): Likewise.
	* trans.cc (trans_code):) Likewise.
	* trans-openmp.cc (gfc_trans_omp_directive): Likewise.
	(gfc_trans_omp_clauses, gfc_split_omp_clauses): Update for
	u2.allocator, fix for u.align.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-3.f90: Update dg-error.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-2.f90: Update dg-error.
	* gfortran.dg/gomp/allocate-4.f90: New test.
	* gfortran.dg/gomp/allocate-5.f90: New test.
	* gfortran.dg/gomp/allocate-6.f90: New test.
	* gfortran.dg/gomp/allocate-7.f90: New test.
	* gfortran.dg/gomp/allocators-1.f90: New test.
	* gfortran.dg/gomp/allocators-2.f90: New test.

 gcc/fortran/dump-parse-tree.cc   |   8 +-
 gcc/fortran/gfortran.h   |   9 +-
 gcc/fortran/match.cc |   7 +-
 gcc/fortran/match.h  |   2 +
 gcc/fortran/openmp.cc| 328 +

Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows

2022-12-20 Thread Tobias Burnus

On 19.12.22 11:51, Tobias Burnus wrote:

On 19.12.22 10:26, Tobias Burnus wrote:

And here is a more light-wight variant, suggested by Nightstrike:

Using '.' instead of creating a new directory - and checking for
__WIN32__ instead for __MINGW32__.

[...]

I have now updated the heavy version. The #if check moved to C as those
macros aren't set in Fortran. (That's now https://gcc.gnu.org/PR108175 -
I thought that there was a PR before, but I couldn't find any.)


This variant has now been committed as
https://gcc.gnu.org/r13-4818-g18fc70aa9c753d17c00211cea9fa5bd843fe94fd

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows

2022-12-19 Thread Tobias Burnus

On 19.12.22 10:26, Tobias Burnus wrote:

And here is a more light-wight variant, suggested by Nightstrike:

Using '.' instead of creating a new directory - and checking for
__WIN32__ instead for __MINGW32__.

The only downside of this variant is that it does not check whether
"close(10,status='delete')" will delete a directory without failing with
an error. – If the latter makes sense, I think a follow-up check should
be added to ensure the directory has indeed been removed by 'close'.


I have now updated the heavy version. The #if check moved to C as those
macros aren't set in Fortran. (That's now https://gcc.gnu.org/PR108175 -
I thought that there was a PR before, but I couldn't find any.)

Additionally, on Windows the '.' directory is now opened - avoiding
issues with POSIX functions (and the requirement to use '#include
' etc.). - As OPEN already fails, there is no point in
checking for the rest.

On the non-Windows side, there is now a check that 'CLOSE' with
status='delete' indeed has deleted the directory.


Thoughts about which variant is better? Other suggestions or comments?

^- comments?

PS: On my x86-64 Linux, OPEN works but READ fails with EISDIR/errno == 21.


And thanks to Nightstrike for testing, suggestions and reporting the
issue at the first place.



On 19.12.22 10:09, Tobias Burnus wrote:

As discussed in #gfortran IRC, on Windows opening a directory fails
with EACCESS.
(It works under Cygwin - nightstrike was so kind to test this.)

Additionally, '[ -d dir ] || mkdir dir' is also not very portable.

Hence, I use an auxiliary C file calling the POSIX functions and
expect a fail for non-Cygwin windows.

Comments? Suggestions? - If there aren't any, I plan to commit it
as obvious tomorrow.


I don't have a strong preference for the one-file/'.'/smaller solutions
vs the two-file/mkdir/close-'delete' solution, but I am slightly
inclined to the the one that tests more.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gfortran.dg/read_dir.f90: Make PASS on Windows

On non-Cygwin Windows, use '.' and expect the documented fail when opening
a directory (EACCESS).  As gfortran does not set __WIN32__ this check is
done on the C side. (On __CYGWIN__, __WIN32__ is not set - but to make it
clear, !__CYGWIN__ is used in #if.)

On non-Windows, replace the 'call system' shell call by the POSIX functions
stat/mkdir/rmdir for better compatibility, especially on embedded systems;
additionally add some more checks. In particular, confirm that 'close' with
status='delete' indeed deleted the directory.

gcc/testsuite/ChangeLog:

	* gfortran.dg/read_dir-aux.c: New; provides my_mkdir, my_rmdir,
	my_verify_not_exists and expect_open_to_fail.
	* gfortran.dg/read_dir.f90: Call those; expect that opening a
	directory fails on Windows.

 gcc/testsuite/gfortran.dg/read_dir-aux.c | 68 
 gcc/testsuite/gfortran.dg/read_dir.f90   | 54 ++---
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/read_dir-aux.c b/gcc/testsuite/gfortran.dg/read_dir-aux.c
new file mode 100644
index 000..307b44472af
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/read_dir-aux.c
@@ -0,0 +1,68 @@
+#if defined(__WIN32__) && !defined(__CYGWIN__)
+  /* Mostly skip on Windows, cf. main file why. */
+
+int expect_open_to_fail () { return 1; }
+
+void my_verify_not_exists (const char *dir) { }
+void my_mkdir (const char *dir) { }
+void my_rmdir (const char *dir) { }
+
+#else
+
+#include   /* For mkdir + permission bits.  */
+#include   /* For rmdir.  */
+#include   /* For errno.  */
+#include   /* For perror.  */
+#include   /* For abort.  */
+ 
+
+int expect_open_to_fail () { return 0; }
+
+void
+my_verify_not_exists (const char *dir)
+{
+  struct stat path_stat;
+  int err = stat (dir, _stat);
+  if (err && errno == ENOENT)
+return;  /* OK */
+  if (err)
+perror ("my_verify_not_exists");
+  else
+printf ("my_verify_not_exists: pathname %s still exists\n", dir);
+  abort ();
+ }
+
+void
+my_mkdir (const char *dir)
+{
+  int err;
+  struct stat path_stat;
+
+  /* Check whether 'dir' exists and is a directory.  */
+  err = stat (dir, _stat);
+  if (err && errno != ENOENT)
+{
+  perror ("my_mkdir: failed to call stat for directory");
+  abort ();
+}
+  if (err == 0 && !S_ISDIR (path_stat.st_mode))
+{
+  printf ("my_mkdir: pathname %s is not a directory\n", dir);
+  abort ();
+}
+
+  err = mkdir (dir, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
+  if (err != 0)
+{
+  perror ("my_mkdir: failed to create directory");
+  abort ();
+}
+}
+
+void
+my_rmdir (const char *dir)
+{
+  

[Patch] gfortran.dg/read_dir.f90: Make PASS on Windows

2022-12-19 Thread Tobias Burnus

As discussed in #gfortran IRC, on Windows opening a directory fails with 
EACCESS.
(It works under Cygwin - nightstrike was so kind to test this.)

Additionally, '[ -d dir ] || mkdir dir' is also not very portable.

Hence, I use an auxiliary C file calling the POSIX functions and
expect a fail for non-Cygwin windows.

Comments? Suggestions? - If there aren't any, I plan to commit it
as obvious tomorrow.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gfortran.dg/read_dir.f90: Make PASS on Windows

Call POSIX's stat/mkdir/rmdir instead of using the shell via 'call system'.
Additionally, expect EACCESS on non-Cygwin Windows as documented for trying
to open a directory.

gcc/testsuite/ChangeLog:

	* gfortran.dg/read_dir-aux.c: New; provides my_mkdir and my_rmdir.
	* gfortran.dg/read_dir.f90: Call my_mkdir/my_rmdir; expect
	error on Windows when opening a directory.

 gcc/testsuite/gfortran.dg/read_dir-aux.c | 39 +
 gcc/testsuite/gfortran.dg/read_dir.f90   | 43 
 2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/read_dir-aux.c b/gcc/testsuite/gfortran.dg/read_dir-aux.c
new file mode 100644
index 000..e8404478517
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/read_dir-aux.c
@@ -0,0 +1,39 @@
+#include   /* For mkdir + permission bits.  */
+#include   /* For rmdir.  */
+#include   /* For errno.  */
+#include   /* For perror.  */
+#include   /* For abort.  */
+ 
+
+void
+my_mkdir (const char *dir)
+{
+  int err;
+  struct stat path_stat;
+
+  /* Check whether 'dir' exists and is a directory.  */
+  err = stat (dir, _stat);
+  if (err && errno != ENOENT)
+{
+  perror ("my_mkdir: failed to call stat for directory");
+  abort ();
+}
+  if (err == 0 && !S_ISDIR (path_stat.st_mode))
+{
+  printf ("my_mkdir: pathname %s is not a directory\n", dir);
+  abort ();
+}
+
+  err = mkdir (dir, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
+  if (err != 0)
+{
+  perror ("my_mkdir: failed to create directory");
+  abort ();
+}
+}
+
+void
+my_rmdir (const char *dir)
+{
+  rmdir (dir);
+}
diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 b/gcc/testsuite/gfortran.dg/read_dir.f90
index c7ddc51fb90..3a8ff6adbc7 100644
--- a/gcc/testsuite/gfortran.dg/read_dir.f90
+++ b/gcc/testsuite/gfortran.dg/read_dir.f90
@@ -1,18 +1,51 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-sources read_dir-aux.c }
+!
 ! PR67367
+
 program bug
+   use iso_c_binding
implicit none
+
+   interface
+ subroutine my_mkdir(s) bind(C)
+   ! Call POSIX's mkdir - and ignore fails due to
+   ! existing directories but fail otherwise
+   import
+   character(len=1,kind=c_char) :: s(*)
+ end subroutine
+ subroutine my_rmdir(s) bind(C)
+   ! Call POSIX's rmdir - and ignore fails
+   import
+   character(len=1,kind=c_char) :: s(*)
+ end subroutine
+   end interface
+
+   character(len=*), parameter :: sdir = "junko.dir"
+   character(len=*,kind=c_char), parameter :: c_sdir = sdir // c_null_char
+
character(len=1) :: c
-   character(len=256) :: message
integer ios
-   call system('[ -d junko.dir ] || mkdir junko.dir')
-   open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream')
+
+   call my_mkdir(c_sdir)
+   open(unit=10, file=sdir,iostat=ios,action='read',access='stream')
+
+#if defined(__MINGW32__)
+   ! Windows is documented to fail with EACCESS when trying to open a directory
+   ! Note: Testing showed that __CYGWIN__ does permit opening directories
+   call my_rmdir(c_sdir)
+   if (ios == 0) &
+  stop 3  ! Expected EACCESS
+   stop 0  ! OK
+#endif   
+
if (ios.ne.0) then
-  call system('rmdir junko.dir')
+  call my_rmdir(c_sdir)
   STOP 1
end if
read(10, iostat=ios) c
-   if (ios.ne.21.and.ios.ne.0) then 
+   if (ios.ne.21.and.ios.ne.0) then  ! EISDIR has often the value 21
   close(10, status='delete')
   STOP 2
end if


Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows

2022-12-19 Thread Tobias Burnus

And here is a more light-wight variant, suggested by Nightstrike:

Using '.' instead of creating a new directory - and checking for
__WIN32__ instead for __MINGW32__.

The only downside of this variant is that it does not check whether
"close(10,status='delete')" will delete a directory without failing with
an error. – If the latter makes sense, I think a follow-up check should
be added to ensure the directory has indeed been removed by 'close'.

Thoughts about which variant is better? Other suggestions or comments?

Tobias

PS: On my x86-64 Linux, OPEN works but READ fails with EISDIR/errno == 21.

On 19.12.22 10:09, Tobias Burnus wrote:

As discussed in #gfortran IRC, on Windows opening a directory fails
with EACCESS.
(It works under Cygwin - nightstrike was so kind to test this.)

Additionally, '[ -d dir ] || mkdir dir' is also not very portable.

Hence, I use an auxiliary C file calling the POSIX functions and
expect a fail for non-Cygwin windows.

Comments? Suggestions? - If there aren't any, I plan to commit it
as obvious tomorrow.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gfortran.dg/read_dir.f90: Make PASS on Windows

Avoid call to the shell using POSIX syntax and use '.' instead.
Additionally, expect fail on non-Cygwin Windows as opening a directory
is documented to fail with EACCESS.

gcc/testsuite/ChangeLog:

	* gfortran.dg/read_dir.f90: Open '.' instead of a freshly created
	directory; expect error on Windows when opening a directory.

 gcc/testsuite/gfortran.dg/read_dir.f90 | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 b/gcc/testsuite/gfortran.dg/read_dir.f90
index c7ddc51fb90..c91d0f78413 100644
--- a/gcc/testsuite/gfortran.dg/read_dir.f90
+++ b/gcc/testsuite/gfortran.dg/read_dir.f90
@@ -1,20 +1,27 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+!
 ! PR67367
+
 program bug
implicit none
character(len=1) :: c
-   character(len=256) :: message
integer ios
-   call system('[ -d junko.dir ] || mkdir junko.dir')
-   open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream')
+   open(unit=10, file='.',iostat=ios,action='read',access='stream')
+
+#if defined(__WIN32__) && !defined(__CYGWIN__)
+   ! Windows is documented to fail with EACCESS when trying to open a directory
+   if (ios == 0) &
+  stop 3  ! Expected EACCESS
+   stop 0  ! OK
+#endif   
+
if (ios.ne.0) then
-  call system('rmdir junko.dir')
   STOP 1
end if
read(10, iostat=ios) c
-   if (ios.ne.21.and.ios.ne.0) then 
-  close(10, status='delete')
+   close(10)
+   if (ios.ne.21.and.ios.ne.0) then  ! EISDIR has often the value 21
   STOP 2
end if
-   close(10, status='delete')
 end program bug


[Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

2022-12-16 Thread Tobias Burnus

Seems to be a CUDA JIT issue - which is fixed by adding a dummy procedure.

Lightly tested with 4 systems at hand, where 2 failed before. One had 10.2 and
the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version
and requires -mptx=3.1.
(I did check that offloading indeed happened and no hostfallback was done.)

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge 

gcc/
	PR libgomp/108098

	* config/nvptx/mkoffload.cc (process): Emit dummy procedure
	alongside reverse-offload function table to prevent NULL values
	of the function addresses.

---
 gcc/config/nvptx/mkoffload.cc | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 5d89ba8..8306aa0 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	fputc (sm_ver2[i], out);
   fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n");
 
+  /* WORKAROUND - see PR 108098
+	 It seems as if older CUDA JIT compiler optimizes the function pointers
+	 in offload_func_table to NULL, which can be prevented by adding a
+	 dummy procedure. With CUDA 11.1, it seems to work fine without
+	 workaround while CUDA 10.2 as some ancient version have need the
+	 workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+	 restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+	 PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+	 PTX ISA 7.1.  */
+  fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+  fprintf (out, "\t\".func __dummy$func ( )\"\n");
+  fprintf (out, "\t\"{\"\n");
+  fprintf (out, "\t\"}\"\n");
+
   size_t fidx = 0;
   for (id = func_ids; id; id = id->next)
 	{


[Patch] gcc-changelog: Add warning for auto-added files

2022-12-16 Thread Tobias Burnus
_level_prs = []
@@ -706,6 +707,7 @@ class GitCommit:
 msg += f' (did you mean "{candidates[0]}"?)'
 details = '\n'.join(difflib.Differ().compare([file], [candidates[0]])).rstrip()
 self.errors.append(Error(msg, file, details))
+auto_add_warnings = {}
 for file in sorted(changed_files - mentioned_files):
 if not self.in_ignored_location(file):
 if file in self.new_files:
@@ -738,6 +740,10 @@ class GitCommit:
 file = file[len(entry.folder):].lstrip('/')
 entry.lines.append('\t* %s: New file.' % file)
 entry.files.append(file)
+if entry.folder not in auto_add_warnings:
+auto_add_warnings[entry.folder] = [file]
+else:
+auto_add_warnings[entry.folder].append(file)
 else:
 msg = 'new file in the top-level folder not mentioned in a ChangeLog'
 self.errors.append(Error(msg, file))
@@ -755,6 +761,13 @@ class GitCommit:
 if pattern not in used_patterns:
 error = "pattern doesn't match any changed files"
 self.errors.append(Error(error, pattern))
+for entry, val in auto_add_warnings.items():
+if len(val) == 1:
+self.warnings.append('Auto-added new file \'%s/%s\''
+ % (entry, val[0]))
+else:
+self.warnings.append('Auto-added %d new files in \'%s\''
+ % (len(val), entry))
 
 def check_for_correct_changelog(self):
 for entry in self.changelog_entries:
@@ -830,6 +843,12 @@ class GitCommit:
 for error in self.errors:
 print(error)
 
+def print_warnings(self):
+if self.warnings:
+print('Warnings:')
+for warning in self.warnings:
+print(warning)
+
 def check_commit_email(self):
 # Parse 'Martin Liska  '
 email = self.info.author.split(' ')[-1].strip('<>')
diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index f3773f178ea..5468efcd0d5 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -119,11 +119,13 @@ if __name__ == '__main__':
 
 success = 0
 for full in sorted(allfiles):
-email = GitEmail(full, False)
+email = GitEmail(full)
 print(email.filename)
 if email.success:
 success += 1
 print('  OK')
+for warning in email.warnings:
+print('  WARN: %s' % warning)
 else:
 for error in email.errors:
 print('  ERR: %s' % error)
@@ -135,6 +137,7 @@ if __name__ == '__main__':
 if email.success:
 print('OK')
 email.print_output()
+email.print_warnings()
 else:
 if not email.info.lines:
 print('Error: patch contains no parsed lines', file=sys.stderr)
diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py
index 89960d307c9..79f8e0b8604 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -461,3 +461,17 @@ class TestGccChangelog(unittest.TestCase):
 def test_CR_in_patch(self):
 email = self.from_patch_glob('0001-Add-M-character.patch')
 assert (email.errors[0].message == 'cannot find a ChangeLog location in message')
+
+def test_auto_add_file_1(self):
+email = self.from_patch_glob('0001-Auto-Add-File.patch')
+assert not email.errors
+assert (len(email.warnings) == 1)
+assert (email.warnings[0]
+== "Auto-added new file 'libgomp/testsuite/libgomp.fortran/allocate-4.f90'")
+
+def test_auto_add_file_2(self):
+email = self.from_patch_glob('0002-Auto-Add-File.patch')
+assert not email.errors
+assert (len(email.warnings) == 2)
+assert (email.warnings[0] == "Auto-added new file 'gcc/doc/gm2.texi'")
+assert (email.warnings[1] == "Auto-added 2 new files in 'gcc/m2'")
diff --git a/contrib/gcc-changelog/test_patches.txt b/contrib/gcc-changelog/test_patches.txt
index c378c32423a..6004608a8f9 100644
--- a/contrib/gcc-changelog/test_patches.txt
+++ b/contrib/gcc-changelog/test_patches.txt
@@ -3636,3 +3636,99 @@ index 000..d75da75
 -- 
 2.38.1
 
+=== 0001-Auto-Add-File.patch 
+From e205ec03f0794aeac3e8a89e947c12624d5a274e Mon Sep 17 00:00:00 2001
+From: Tobias Burnus 
+Date: Thu, 15 Dec 2022 12:25:07 +0100
+Subject: [PATCH] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for
+ backward-only code [PR108056]
+
+libgfortran/ChangeLog:
+
+	PR libfortran/108056
+	* runtime/I

[Patch] gcc-changelog/git_email.py: Support older unidiff.PatchSet

2022-12-16 Thread Tobias Burnus

Another backward compatibility issue - failed here on Ubuntu 20.04 which
is old but not ancient.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-changelog/git_email.py: Support older unidiff.PatchSet

Commit "unidiff: use newline='\n' argument",
r13-4603-gb045179973161115c7ea029b2788f5156fc55cda, added support CR
on a line, but that broke support for older unidiff.PatchSet.

This patch uses a fallback for git_email.py (drop argument) if not
available (TypeError exception) but keeps using it in test_email.py
unconditionally.

contrib/ChangeLog:

	* gcc-changelog/git_email.py (GitEmail:__init__): Support older
	unidiff.PatchSet that do not have a newline= argument
	of from_filename.

diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index ef50ebfb7fd..093c887ba4c 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -39,7 +39,11 @@ unidiff_supports_renaming = hasattr(PatchedFile(), 'is_rename')
 class GitEmail(GitCommit):
 def __init__(self, filename):
 self.filename = filename
-diff = PatchSet.from_filename(filename, newline='\n')
+try:
+  diff = PatchSet.from_filename(filename, newline='\n')
+except TypeError:
+  # Older versions don't have the newline argument
+  diff = PatchSet.from_filename(filename)
 date = None
 author = None
 subject = ''


Re: [Patch] libgomp: Handle OpenMP's reverse offloads

2022-12-15 Thread Tobias Burnus

Hi,

On 15.12.22 20:42, Tobias Burnus wrote:

If the libgomp plugin doesn't request special
'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard
'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use
'gomp_device_copy', which expects the device to be locked.  (As can be
told by the unconditional 'gomp_mutex_unlock (>lock);' before
'gomp_fatal'.)  However, in a number of the
'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev',
the device definitely is not locked; see


Actually, reading it + the source code again, I think it makes sense to
return a boolean – similar to devicep->host2dev_func and
devicep->dev2host_func — and possibly wrap it into some convenience
function, similar to gomp_device_copy – at least a bare exit() without
further diagnostic does not seem to userfriendly.

BTW: In line with the other code, you could use CUDA_CALL instead of
CUDA_CALL_ERET; the fomer already calls the latter with 'false' as first
argument + is used elsewhere.

Regarding the lock: It seems the problem is the copying of
devaddrs/sizes/kinds; this does not need any lock as the stack variables
are on the device and only used for this reverse offload. Thus, there is
no need for a lock as there are no races.

However, as the existing gomp_copy_dev2host removes the lock, we could
simply keep this lock – and probably should move it down to just before
the user-function call – removing all (non-error) locks and unlocks on
the way. — I mean something like the attached patch.

Finally, I think we need to find a solution for the issue Andrew tried
to address. — The current code invokes CUDA_CALL_ASSERT – which calls
GOMP_PLUGIN_fatal.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
diff --git a/libgomp/target.c b/libgomp/target.c
index e38cc3b6f1c..4b7233307cd 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -3319,5 +3319,6 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
   gomp_mutex_lock (>lock);
   n = gomp_map_lookup_rev (>mem_map_rev, );
-  gomp_mutex_unlock (>lock);
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+gomp_mutex_unlock (>lock);
 
   if (n == NULL)
@@ -3409,5 +3410,4 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
   cdata = gomp_alloca (sizeof (*cdata) * mapnum);
   memset (cdata, '\0', sizeof (*cdata) * mapnum);
-  gomp_mutex_lock (>lock);
   for (uint64_t i = 0; i < mapnum; i++)
 	{
@@ -3643,4 +3643,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
   uint64_t struct_cpy = 0;
   bool clean_struct = false;
+  gomp_mutex_lock (>lock);
   for (uint64_t i = 0; i < mapnum; i++)
 	{
@@ -3695,5 +3696,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 	  gomp_aligned_free ((void *) (uintptr_t) devaddrs[i]);
 	}
-
+  gomp_mutex_unlock (>lock);
   free (devaddrs);
   free (sizes);


Re: [Patch] libgomp: Handle OpenMP's reverse offloads

2022-12-15 Thread Tobias Burnus

Hi,

I have not fully tried to understand it, yet.

(A) Regarding the issue of stalling, see als Andrew's patch and the
discussion about it in

"[PATCH] libgomp: fix hang on fatal error",
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html

and in particular Jakub's two replies.

(b) I think you want to remove this:

On 15.12.22 18:34, Thomas Schwinge wrote:

--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1,3 +1,5 @@
+#pragma GCC optimize "O0"
+
  /* Plugin for NVPTX execution.


(c)


If the libgomp plugin doesn't request special
'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard
'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use
'gomp_device_copy', which expects the device to be locked.  (As can be
told by the unconditional 'gomp_mutex_unlock (>lock);' before
'gomp_fatal'.)  However, in a number of the
'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev',
the device definitely is not locked; see the calls adjacent to the TODO


The question is what unlocks the device – it is surely locked in 
gomp_target_rev by:

  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM))
...
  gomp_mutex_lock (>lock);
  for (uint64_t i = 0; i < mapnum; i++)
...
}
  gomp_mutex_unlock (>lock);
}

Except for code like:
gomp_mutex_unlock (>lock);
gomp_fatal ("gomp_target_rev unhandled kind 0x%.4x", kinds[i]);

The only functions that know about the pointer and get called are those behind
the dev_to_host_cpy and host_to_dev_cpy - thus, they seemingly mess about with 
the
unlocking?!?

 * * *

Regarding your patch, I do not understand why you call twice unlock and
have trice TODO unlock; that does not seem to make any sense.

I think it is worthwhile to understand why plugin-nvptx.c unlocks the lock in
the non-error case - as you observe that it is not locked in the error case.

Additionally, it seems to make more sense to look into a revised patch of
Andrew's patch, your patch looks like a rather bad band aid.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] Fortran/OpenMP: Add parsing support for allocators directive

2022-12-14 Thread Tobias Burnus

This patch adds parsing/argument-checking support for
  '!$omp allocators allocate([align(int),allocator(a) :] list)'

This is kind of logical follow-up and prep patch for the
  '!$omp allocate(list) [align(v) allocator(a)]'
support that was submitted as part of a larger patchset by Abid; cf.
review at
  "[PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 
5.0)."
  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html

My follow-up patch will add parsing support for declarative/executable '!$omp 
allocate'.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Add parsing support for allocators directive

gcc/fortran/ChangeLog:

	* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATORS and
	ST_OMP_END_ALLOCATORS.
	(enum gfc_exec_op): Add EXEC_OMP_ALLOCATORS.
	* dump-parse-tree.cc (show_omp_node, show_code_node): Handle
	OpenMP's ALLOCATORS directive.
	* match.h (gfc_match_omp_allocators): New prototype.
	* openmp.cc (OMP_ALLOCATORS_CLAUSES): Define.
	(gfc_match_omp_allocators): New.
	(resolve_omp_clauses, omp_code_to_statement,
	gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATORS.
	* parse.cc (parse_openmp_allocate_block): New.
	(case_exec_markers): Add ST_OMP_ALLOCATORS.
	(decode_omp_directive, gfc_ascii_statement,
	parse_executable): Parse OpenMP allocators directive.
	* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_ALLOCATORS.
	* st.cc (gfc_free_statement): Likewise.
	* trans.cc (trans_code): Likewise.
	* trans-openmp.cc (gfc_trans_omp_directive): Show 'sorry' for
	EXEC_OMP_ALLOCATORS.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocators-1.f90: New test.
	* gfortran.dg/gomp/allocators-2.f90: New test.

 gcc/fortran/dump-parse-tree.cc  |  2 +
 gcc/fortran/gfortran.h  |  3 +-
 gcc/fortran/match.h |  1 +
 gcc/fortran/openmp.cc   | 31 ++-
 gcc/fortran/parse.cc| 50 -
 gcc/fortran/resolve.cc  |  2 +
 gcc/fortran/st.cc   |  1 +
 gcc/fortran/trans-openmp.cc |  3 ++
 gcc/fortran/trans.cc|  1 +
 gcc/testsuite/gfortran.dg/gomp/allocators-1.f90 | 28 ++
 gcc/testsuite/gfortran.dg/gomp/allocators-2.f90 | 22 +++
 11 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 5ae72dc1cac..4565b71c758 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2081,6 +2081,7 @@ show_omp_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
 case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
 case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+case EXEC_OMP_ALLOCATORS: name = "ALLOCATORS"; break;
 case EXEC_OMP_ASSUME: name = "ASSUME"; break;
 case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
 case EXEC_OMP_BARRIER: name = "BARRIER"; break;
@@ -3409,6 +3410,7 @@ show_code_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE:
 case EXEC_OACC_ENTER_DATA:
 case EXEC_OACC_EXIT_DATA:
+case EXEC_OMP_ALLOCATORS:
 case EXEC_OMP_ASSUME:
 case EXEC_OMP_ATOMIC:
 case EXEC_OMP_CANCEL:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5f8a81ae4a1..63f38d2 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -318,6 +318,7 @@ enum gfc_statement
   ST_OMP_END_MASKED_TASKLOOP, ST_OMP_MASKED_TASKLOOP_SIMD,
   ST_OMP_END_MASKED_TASKLOOP_SIMD, ST_OMP_SCOPE, ST_OMP_END_SCOPE,
   ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES,
+  ST_OMP_ALLOCATORS, ST_OMP_END_ALLOCATORS,
   /* Note: gfc_match_omp_nothing returns ST_NONE. */
   ST_OMP_NOTHING, ST_NONE
 };
@@ -2959,7 +2960,7 @@ enum gfc_exec_op
   EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED,
   EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIMD,
   EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE,
-  EXEC_OMP_ERROR
+  EXEC_OMP_ERROR, EXEC_OMP_ALLOCATORS
 };
 
 typedef struct gfc_code
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 2a805815d9c..b1f5db80125 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -149,6 +149,7 @@ match gfc_match_oacc_routine (void);
 
 /* OpenMP directive matchers.  */
 match gfc_match_omp_eos_error (void);
+match gfc_match_omp_allocators (void);
 match gfc_match_omp_assume (void);
 match gfc_match_omp_assumes (void);
 match gfc_match_omp_atomic (void);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 

[Patch] mklog: only do is_binary_file check if available

2022-12-14 Thread Tobias Burnus

Ubuntu 20.04.5 LTS (focal) unfortunately has an too old unidiff.PatchSet
for the feature added on Monday.

Solution: use is_binary_file only when it is available.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
#!/usr/bin/env python3

# Copyright (C) 2020-2022 Free Software Foundation, Inc.
#
# This file is part of GCC.
#
# GCC is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# GCC is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with GCC; see the file COPYING.  If not, write to
# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
# Boston, MA 02110-1301, USA.

# This script parses a .diff file generated with 'diff -up' or 'diff -cp'
# and adds a skeleton ChangeLog file to the file. It does not try to be
# too smart when parsing function names, but it produces a reasonable
# approximation.
#
# Author: Martin Liska 

import argparse
import datetime
import json
import os
import re
import subprocess
import sys
from itertools import takewhile

import requests

from unidiff import PatchSet

LINE_LIMIT = 100
TAB_WIDTH = 8
CO_AUTHORED_BY_PREFIX = 'co-authored-by: '

pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
dg_regex = re.compile(r'{\s+dg-(error|warning)')
pr_filename_regex = re.compile(r'(^|[\W_])[Pp][Rr](?P\d{4,})')
identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)')
comment_regex = re.compile(r'^\/\*')
struct_regex = re.compile(r'^(class|struct|union|enum)\s+'
  r'(GTY\(.*\)\s+)?([a-zA-Z0-9_]+)')
macro_regex = re.compile(r'#\s*(define|undef)\s+([a-zA-Z0-9_]+)')
super_macro_regex = re.compile(r'^DEF[A-Z0-9_]+\s*\(([a-zA-Z0-9_]+)')
fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]')
template_and_param_regex = re.compile(r'<[^<>]*>')
md_def_regex = re.compile(r'\(define.*\s+"(.*)"')
bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \
   'include_fields=summary,component'

function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'}

# NB: Makefile.in isn't listed as it's not always generated.
generated_files = {'aclocal.m4', 'config.h.in', 'configure'}

help_message = """\
Generate ChangeLog template for PATCH.
PATCH must be generated using diff(1)'s -up or -cp options
(or their equivalent in git).
"""

script_folder = os.path.realpath(__file__)
root = os.path.dirname(os.path.dirname(script_folder))


def find_changelog(path):
folder = os.path.split(path)[0]
while True:
if os.path.exists(os.path.join(root, folder, 'ChangeLog')):
return folder
folder = os.path.dirname(folder)
if folder == '':
return folder
raise AssertionError()


def extract_function_name(line):
if comment_regex.match(line):
return None
m = struct_regex.search(line)
if m:
# Struct declaration
return m.group(1) + ' ' + m.group(3)
m = macro_regex.search(line)
if m:
# Macro definition
return m.group(2)
m = super_macro_regex.search(line)
if m:
# Supermacro
return m.group(1)
m = fn_regex.search(line)
if m:
# Discard template and function parameters.
fn = m.group(1)
fn = re.sub(template_and_param_regex, '', fn)
return fn.rstrip()
return None


def try_add_function(functions, line):
fn = extract_function_name(line)
if fn and fn not in functions:
functions.append(fn)
return bool(fn)


def sort_changelog_files(changed_file):
return (changed_file.is_added_file, changed_file.is_removed_file)


def get_pr_titles(prs):
output = []
for idx, pr in enumerate(prs):
pr_id = pr.split('/')[-1]
r = requests.get(bugzilla_url % pr_id)
bugs = r.json()['bugs']
if len(bugs) == 1:
prs[idx] = 'PR %s/%s' % (bugs[0]['component'], pr_id)
out = '%s - %s\n' % (prs[idx], bugs[0]['summary'])
if out not in output:
output.append(out)
if output:
output.append('')
return '\n'.join(output)


def append_changelog_line(out, relative_path, text):
line = f'\t* {relative_path}:'
if len(line.replace('\t', ' ' * TAB_WIDTH) + ' ' + text) <= LINE_LIMIT:
out += f'{line} {text}\n'
   

Re: [Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]

2022-12-13 Thread Tobias Burnus

Hi Harald,

On 13.12.22 23:27, Harald Anlauf wrote:

Am 13.12.22 um 22:41 schrieb Tobias Burnus:

Back to differences: 'diff -U0 -p -w' against the last GCC 11 branch
shows:

...
@@ -35,0 +37,2 @@ export_proto(cfi_desc_to_gfc_desc);
+/* NOTE: Since GCC 12, the FE generates code to do the conversion
+   directly without calling this function.  */
@@ -63 +66 @@ cfi_desc_to_gfc_desc (gfc_array_void *d,
-  d->dtype.version = s->version;
+  d->dtype.version = 0;


I was wondering what the significance of "version" is.
In ISO_Fortran_binding.h we seem to always have
   #define CFI_VERSION 1
and it did not change with gcc-12.


The version is 1 for CFI but it is 0 for GFC. However, as we do not
check the GFC version anywhere and it is not publicly exposed, it does
not really matter. Still, "d->dtype.version = 0;" matches what the
compiler itself produces – and for consistency, setting it to 0 is
better than setting it to 1 (via CFI's version field).

Actually 'dtype.version' is not really set anywhere; at least
gfc_get_dtype_rank_type(...) does not set it; zero initialization is
most common but it could be also some random value. In libgfortran,
GFC_DTYPE_CLEAR explicitly sets it to 0.

@@ -100,2 +110,2 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt
-d = malloc (sizeof (CFI_cdesc_t)
-   + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t)));
+d = calloc (1, (sizeof (CFI_cdesc_t)
+   + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t;
@@ -107 +117 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt
-  d->version = s->dtype.version;
+  d->version = CFI_VERSION;


This treatment of "version" was the equivalent to the above that
confused me.  Assuming we were to change CFI_VERSION in gcc-13+,
is this the right choice here regarding backward compatibility?


I don't think we will change CFI version any time soon as we rather
closely follow the Fortran standard and I do not see any changes which
are required there.

NOTE: As s->dtype.version is either 0 or some random value, setting
version in the CFI / ISO C descriptor to 1, be it as literal or as macro
constant, makes it the same as CFI_VERSION.

And: I don't think we will change CFI_VERSION or the structure of the
CFI array descriptor any time soon; there does not seem to be any need
for it, it matches the Fortran standard one well (and no plans seem to
be planed on that side) and, finally, changing an array descriptor is
painful!

However, using '1;  /* CFI_VERSION in GCC 11 and at time of writing. */'
would also work – but I would expect that we will go through all CFI
users if we ever change the descriptor (and bump the version), possibly
adding version-number dependent code.


So besides the "version" question ok from my side.


I hope I could answer the latter.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]

2022-12-13 Thread Tobias Burnus

Hi Harald,

On 13.12.22 21:53, Harald Anlauf via Gcc-patches wrote:


I now did so - except for three fixes (cf. changelog). See also
PR: https://gcc.gnu.org/PR108056

There is no testcase as it needs to be compiled by GCC <= 11 and then
run with linking (dynamically) to a GCC 12 or 13 libgfortran.


I've looked at the resulting ISO_Fortran_binding.c vs. the 11-branch
version and am still trying to understand the resulting differences
in the code, in what respect they might be relevant or not.


Hmm, if I run a diff, I do not see much differences.

Note: We only talk about those two functions, the other functions are used
by both GCC <= 11 and GCC >= 12.

Fortunately, these functions matter most as they map GFC internals to CFI
internals or vice versa. Most other functions are user callable and there
incompatibilities are less likely to show up and GCC 11 users also could
profit from fixes there. It looks as if CFI_section and CFI_select_part had
some larger changes, likewise CFI_setpointer.

Back to differences: 'diff -U0 -p -w' against the last GCC 11 branch shows:

...
@@ -35,0 +37,2 @@ export_proto(cfi_desc_to_gfc_desc);
+/* NOTE: Since GCC 12, the FE generates code to do the conversion
+   directly without calling this function.  */
@@ -63 +66 @@ cfi_desc_to_gfc_desc (gfc_array_void *d,
-  d->dtype.version = s->version;
+  d->dtype.version = 0;
@@ -76,0 +80 @@ cfi_desc_to_gfc_desc (gfc_array_void *d,
+  if (GFC_DESCRIPTOR_DATA (d))
@@ -79,3 +83,7 @@ cfi_desc_to_gfc_desc (gfc_array_void *d,
-  GFC_DESCRIPTOR_LBOUND(d, n) = (index_type)s->dim[n].lower_bound;
-  GFC_DESCRIPTOR_UBOUND(d, n) = (index_type)(s->dim[n].extent
-   + s->dim[n].lower_bound - 1);
+   CFI_index_t lb = 1;
+
+   if (s->attribute != CFI_attribute_other)
+ lb = s->dim[n].lower_bound;
+
+   GFC_DESCRIPTOR_LBOUND(d, n) = (index_type)lb;
+   GFC_DESCRIPTOR_UBOUND(d, n) = (index_type)(s->dim[n].extent + lb - 1);
@@ -89,0 +98,2 @@ export_proto(gfc_desc_to_cfi_desc);
+/* NOTE: Since GCC 12, the FE generates code to do the conversion
+   directly without calling this function.  */
@@ -100,2 +110,2 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt
-d = malloc (sizeof (CFI_cdesc_t)
-   + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t)));
+d = calloc (1, (sizeof (CFI_cdesc_t)
+   + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t;
@@ -107 +117 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt
-  d->version = s->dtype.version;
+  d->version = CFI_VERSION;
@@ -153 +163 @@ void *CFI_address (const CFI_cdesc_t *dv
...


Given that this is a somewhat delicate situation we're in, is there
a set of tests that I could run *manually* (i.e. compile with gcc-11
and link with gcc-12/13) to verify that this best-effort fix should
be good enough for the common user?

Just a suggestion of a few "randomly" chosen tests?


Probably yes. I don't have a better suggestion. The problem is that it
usually only matters in some corner cases, like in the PR where a not
some argument is passed to the GFC→CFI conversion but first a Fortran
function is called with TYPE(*) any only then it is passed on. – Such
cases are usually not in the testsuite. (With GCC 12 we have a rather
complex testsuite, but obviously it also does not cover everything.)



Note: It is strongly recommended to use GCC 12 (or 13) with
array-descriptor
C interop as many issues were fixed. [...]


Well, in the real world there are larger installations with large
software stacks, and it is easier said to "compile each component
with the same compiler version" than done...


I concur – but there were really many fixes for the array descriptor /
TS29113 in GCC 12.

It is simply not possible to fix tons of bugs and be 100% compatible
with the working bits of the old version – especially if they only work
if one does not look sharply at the result. (Like here, were 'type' is
wrong, which does not matter unless in programs which use them.)

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] OpenMP: Parse align clause in allocate directive in C/C++

2022-12-13 Thread Tobias Burnus

We have a working parsing support for the 'allocate' directive
(failing immediately with a sorry after parsing).

To be in line with the rest of the allocat(e,or) etc. handling,
it makes sense to take care of 'align' as well, which is this
patch does - it still fails with a 'sorry' after parsing.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Parse align clause in allocate directive in C/C++

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_allocate): Parse align
	clause and check for restrictions.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_allocate): Parse align
	clause.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-5.c: Extend for align clause.

 gcc/c/c-parser.cc| 88 
 gcc/cp/parser.cc | 58 +-
 gcc/testsuite/c-c++-common/gomp/allocate-5.c | 36 
 3 files changed, 144 insertions(+), 38 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 1bbb39f9b08..62c302748dd 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18819,32 +18819,71 @@ c_parser_oacc_wait (location_t loc, c_parser *parser, char *p_name)
   return stmt;
 }
 
-/* OpenMP 5.0:
-   # pragma omp allocate (list)  [allocator(allocator)]  */
+/* OpenMP 5.x:
+   # pragma omp allocate (list)  clauses
+
+   OpenMP 5.0 clause:
+   allocator (omp_allocator_handle_t expression)
+
+   OpenMP 5.1 additional clause:
+   align (int expression)] */
 
 static void
 c_parser_omp_allocate (location_t loc, c_parser *parser)
 {
+  tree alignment = NULL_TREE;
   tree allocator = NULL_TREE;
   tree nl = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_ALLOCATE, NULL_TREE);
-  if (c_parser_next_token_is (parser, CPP_COMMA)
-  && c_parser_peek_2nd_token (parser)->type == CPP_NAME)
-c_parser_consume_token (parser);
-  if (c_parser_next_token_is (parser, CPP_NAME))
+  do
 {
+  if (c_parser_next_token_is (parser, CPP_COMMA)
+	  && c_parser_peek_2nd_token (parser)->type == CPP_NAME)
+	c_parser_consume_token (parser);
+  if (!c_parser_next_token_is (parser, CPP_NAME))
+	break;
   matching_parens parens;
   const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
   c_parser_consume_token (parser);
-  if (strcmp ("allocator", p) != 0)
-	error_at (c_parser_peek_token (parser)->location,
-		  "expected %");
-  else if (parens.require_open (parser))
+  location_t expr_loc = c_parser_peek_token (parser)->location;
+  if (strcmp ("align", p) != 0 && strcmp ("allocator", p) != 0)
 	{
-	  location_t expr_loc = c_parser_peek_token (parser)->location;
-	  c_expr expr = c_parser_expr_no_commas (parser, NULL);
-	  expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
-	  allocator = expr.value;
-	  allocator = c_fully_fold (allocator, false, NULL);
+	  error_at (c_parser_peek_token (parser)->location,
+		"expected % or %");
+	  break;
+	}
+  if (!parens.require_open (parser))
+	break;
+
+  c_expr expr = c_parser_expr_no_commas (parser, NULL);
+  expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
+  expr_loc = c_parser_peek_token (parser)->location;
+  if (p[2] == 'i' && alignment)
+	{
+	  error_at (expr_loc, "too many %qs clauses", "align");
+	  break;
+	}
+  else if (p[2] == 'i')
+	{
+	  alignment = c_fully_fold (expr.value, false, NULL);
+	  if (TREE_CODE (alignment) != INTEGER_CST
+	  || !INTEGRAL_TYPE_P (TREE_TYPE (alignment))
+	  || tree_int_cst_sgn (alignment) != 1
+	  || !integer_pow2p (alignment))
+	{
+	  error_at (expr_loc, "% clause argument needs to be "
+  "positive constant power of two integer "
+  "expression");
+	  alignment = NULL_TREE;
+	}
+	}
+  else if (allocator)
+	{
+	  error_at (expr_loc, "too many %qs clauses", "allocator");
+	  break;
+	}
+  else
+	{
+	  allocator = c_fully_fold (expr.value, false, NULL);
 	  tree orig_type
 	= expr.original_type ? expr.original_type : TREE_TYPE (allocator);
 	  orig_type = TYPE_MAIN_VARIANT (orig_type);
@@ -18853,20 +18892,23 @@ c_parser_omp_allocate (location_t loc, c_parser *parser)
 	  || TYPE_NAME (orig_type)
 		 != get_identifier ("omp_allocator_handle_t"))
 	{
-	  error_at (expr_loc, "% clause allocator expression "
-"has type %qT rather than "
-"%",
-TREE_TYPE (allocator));
+	  error_at (expr_loc,
+			"% clause allocator expression has type "
+			"%qT rather than %",
+			TREE_TYPE (allocator));
 	  allocator = NULL_TREE;
 	}
-	  parens.skip_until_found_close (parser);
 	}
-}
+  parens.skip_until_found_close (parser);
+} while (true);
   c_parser_skip_to_pragma_eol (parser);
 
-  if (allocator)
+  if (allocator || 

[Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause

2022-12-13 Thread Tobias Burnus

I missed that 'align' needs to be a power of 2 - contrary to 'aligned',
which does not have this restriction for some odd reason.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Extend align-clause checks of OpenMP's allocate directive

gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_clauses): Check also for
	power of two.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-3.f90: Fix ALIGN
	usage, remove unused -fdump-tree-original.
	* testsuite/libgomp.fortran/allocate-4.f90: New.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 686f924b47a..5468cc97969 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7315,11 +7315,12 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	  || n->u.align->ts.type != BT_INTEGER
 	  || n->u.align->rank != 0
 	  || gfc_extract_int (n->u.align, )
-	  || alignment <= 0)
+	  || alignment <= 0
+	  || !pow2p_hwi (alignment))
 	{
-	  gfc_error ("ALIGN modifier requires a scalar positive "
-			 "constant integer alignment expression at %L",
-			 >u.align->where);
+	  gfc_error ("ALIGN modifier requires at %L a scalar positive "
+			 "constant integer alignment expression that is a "
+			 "power of two", >u.align->where);
 	  break;
 	}
 	}

diff --git a/libgomp/testsuite/libgomp.fortran/allocate-3.f90 b/libgomp/testsuite/libgomp.fortran/allocate-3.f90
index a39819164d6..1fa0bb932c3 100644
--- a/libgomp/testsuite/libgomp.fortran/allocate-3.f90
+++ b/libgomp/testsuite/libgomp.fortran/allocate-3.f90
@@ -1,5 +1,4 @@
 ! { dg-do compile }
-! { dg-additional-options "-fdump-tree-original" }
 
 use omp_lib
 implicit none
@@ -23,6 +22,7 @@ integer :: q, x,y,z
 ! { dg-error "Object 'omp_high_bw_mem_alloc' is not a variable" "" { target *-*-* } .-1 }
 !$omp end parallel
 
-!$omp parallel allocate( align(q) : x) firstprivate(x) ! { dg-error "31:ALIGN modifier requires a scalar positive constant integer alignment expression at" }
+!$omp parallel allocate( align(128) : x) firstprivate(x) ! OK
 !$omp end parallel
+
 end
diff --git a/libgomp/testsuite/libgomp.fortran/allocate-4.f90 b/libgomp/testsuite/libgomp.fortran/allocate-4.f90
new file mode 100644
index 000..ddb507ba8e4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/allocate-4.f90
@@ -0,0 +1,42 @@
+! { dg-do compile }
+
+
+subroutine test()
+use iso_c_binding, only: c_intptr_t
+implicit none
+integer, parameter :: omp_allocator_handle_kind = 1 !! <<<
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_high_bw_mem_alloc = 4
+integer :: q, x,y,z
+integer, parameter :: cnst(2) = [64, 101]
+
+!$omp parallel allocate( omp_high_bw_mem_alloc : x)  firstprivate(x) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind" }
+!$omp end parallel
+
+!$omp parallel allocate( allocator (omp_high_bw_mem_alloc) : x)  firstprivate(x) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind" }
+!$omp end parallel
+
+!$omp parallel allocate( align (q) : x)  firstprivate(x) ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+
+!$omp parallel allocate( align (32) : x)  firstprivate(x) ! OK
+!$omp end parallel
+
+!$omp parallel allocate( align(q) : x) firstprivate(x) ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+
+!$omp parallel allocate( align(cnst(1)) : x ) firstprivate(x) ! OK
+!$omp end parallel
+
+!$omp parallel allocate( align(cnst(2)) : x) firstprivate(x)  ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+
+!$omp parallel allocate( align( 31) :x) firstprivate(x)  ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+
+!$omp parallel allocate( align (32.0): x) firstprivate(x)  ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+
+!$omp parallel allocate( align(cnst ) : x ) firstprivate(x)  ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" }
+!$omp end parallel
+end


[Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]

2022-12-13 Thread Tobias Burnus

This is a 12/13 regression as come changes to fix the GFC/CFI descriptor
that went into GCC 12 fail with the (bogus) descriptor passed via by a
GCC-11-compiled program.

As later GCC 12 changes moved the descriptor to the front end, those
functions are only in libgomp.so to cater for old program. Richard
suggested in the PR that the best way is to move to the GCC 11 version,
such that libgfortran.so won't regress.

I now did so - except for three fixes (cf. changelog). See also
PR: https://gcc.gnu.org/PR108056

There is no testcase as it needs to be compiled by GCC <= 11 and then
run with linking (dynamically) to a GCC 12 or 13 libgfortran.

OK for mainline and GCC 12?

 * * *

Note: It is strongly recommended to use GCC 12 (or 13) with array-descriptor
C interop as many issues were fixed. Like for the testcase in the PR; in GCC 11
the type arriving in libgomp is BT_ASSUME ('type(*)'). But as the effective
argument is passed as array descriptor through out, the 'float' (real(4)) type
info is actually preservable (as GCC 12 cf. testcase of comment 0 and my comment
in the PR for the C part of the testcase).(*)

Tobias

((*) This is not possible if using a scalar 'type(*)' or a non-array-descriptor
array in between. I think GCC 12 uses 'CFI_other' in the information-is-lost 
case.)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]

Since GCC 12, the conversion between the array descriptors formats - the
internal (GFC) and the C binding one (CFI) moved to the compiler itself
such that the cfi_desc_to_gfc_desc/gfc_desc_to_cfi_desc functions are only
used with older code (GCC 9 to 11).  The newly added checks caused asserts
as older code did not pass the proper values (e.g. real(4) as effective
argument arrived as BT_ASSUME type as the effective type got lost inbetween).

As proposed in the PR, revert to the GCC 11 version - known bugs is better
than some fixes and new issues. Still, GCC 12 is much better in terms of
TS29113 support and should really be used.

This patch uses the current libgomp version of the GCC 11 branch, except
it fixes the GFC version number (which is 0), uses calloc instead of malloc,
and sets the lower bound to 1 instead of keeping it as is for
CFI_attribute_other.

libgfortran/ChangeLog:

	PR libfortran/108056
	* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
	gfc_desc_to_cfi_desc): Mostly revert to GCC 11 version for
	those backward-compatiblity-only functions.

diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 342df4275b9..e63a717a69b 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -39,60 +39,31 @@ export_proto(cfi_desc_to_gfc_desc);
 void
 cfi_desc_to_gfc_desc (gfc_array_void *d, CFI_cdesc_t **s_ptr)
 {
-  signed char type;
-  size_t size;
   int n;
+  index_type kind;
   CFI_cdesc_t *s = *s_ptr;
 
   if (!s)
 return;
 
-  /* Verify descriptor.  */
-  switch (s->attribute)
-{
-case CFI_attribute_pointer:
-case CFI_attribute_allocatable:
-  break;
-case CFI_attribute_other:
-  if (s->base_addr)
-	break;
-  runtime_error ("Nonallocatable, nonpointer actual argument to BIND(C) "
-		 "dummy argument where the effective argument is either "
-		 "not allocated or not associated");
-  break;
-default:
-  runtime_error ("Invalid attribute type %d in CFI_cdesc_t descriptor",
-		 (int) s->attribute);
-  break;
-}
   GFC_DESCRIPTOR_DATA (d) = s->base_addr;
+  GFC_DESCRIPTOR_TYPE (d) = (signed char)(s->type & CFI_type_mask);
+  kind = (index_type)((s->type - (s->type & CFI_type_mask)) >> CFI_type_kind_shift);
 
   /* Correct the unfortunate difference in order with types.  */
-  type = (signed char)(s->type & CFI_type_mask);
-  switch (type)
-{
-case CFI_type_Character:
-  type = BT_CHARACTER;
-  break;
-case CFI_type_struct:
-  type = BT_DERIVED;
-  break;
-case CFI_type_cptr:
-  /* FIXME: PR 100915.  GFC descriptors do not distinguish between
-	 CFI_type_cptr and CFI_type_cfunptr.  */
-  type = BT_VOID;
-  break;
-default:
-  break;
-}
-
-  GFC_DESCRIPTOR_TYPE (d) = type;
-  GFC_DESCRIPTOR_SIZE (d) = s->elem_len;
+  if (GFC_DESCRIPTOR_TYPE (d) == BT_CHARACTER)
+GFC_DESCRIPTOR_TYPE (d) = BT_DERIVED;
+  else if (GFC_DESCRIPTOR_TYPE (d) == BT_DERIVED)
+GFC_DESCRIPTOR_TYPE (d) = BT_CHARACTER;
+
+  if (!s->rank || s->dim[0].sm == (CFI_index_t)s->elem_len)
+GFC_DESCRIPTOR_SIZE (d) = s->elem_len;
+  else if (GFC_DESCRIPTOR_TYPE (d) != BT_DERIVED)
+GFC_DESCRIPTOR_SIZE (d) = kind;
+  else
+GFC_DESCRIPTOR_SIZE (d) = s->elem_len;
 
   d->dtype.version = 

[committed] fortran/openmp.cc: Remove 's' that slipped in during %<..%> replacement (was: [Patch] Fortran: Replace simple '.' quotes by %<.%>)

2022-12-11 Thread Tobias Burnus

On 09.12.22 22:12, Tobias Burnus wrote:

Found when working on the just submitted/committed patch.


Committed as r13-4590 –  but it required a follow-up that I somehow
missed :-/ but that is now committed as well (as r13-4597).

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 045592f665bcb67b75dc6b86badbe2fd44aed3e6
Author: Tobias Burnus 
Date:   Sun Dec 11 11:47:55 2022 +0100

fortran/openmp.cc: Remove 's' that slipped in during %<..%> replacement

Seemingly, 's' (in VI that's the 's'ubstitute command) appeared verbatim in
a gfc_error message when to doing the '...' to %<...%> replacements in commit
r13-4590-g84f6f8a2a97f88be01e223c9c9dbab801a4f501f

gcc/fortran/
* openmp.cc (gfc_match_omp_context_selector_specification):
Remove spurious 's' in an error message.
---
 gcc/fortran/openmp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 7edc78ad0cb..686f924b47a 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -5568,7 +5568,7 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv)
 
   if (m != MATCH_YES || i == selector_set_count)
 	{
-	  gfc_error ("expected %, %, % "
+	  gfc_error ("expected %, %, % "
 		 "or % at %C");
 	  return MATCH_ERROR;
 	}


Re: [PATCH 2/2] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)

2022-12-10 Thread Tobias Burnus

Hi Julian,

On 10.12.22 13:10, Julian Brown wrote:

On Thu, 8 Dec 2022 13:04:20 +0100
Tobias Burnus  wrote:

All in all, I am fine with the patch - but I spotted some issues.

...

I believe this patch covers all the above cases (hopefully
appropriately generalised), at least for Fortran. I haven't attempted
to fix any missing cases for C, for now.

Re-tested with offloading to NVPTX (with a few supporting patches, as
before).

Does this look OK now?


Yes, LGTM.

Thanks!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads)

2022-12-10 Thread Tobias Burnus

Now that the reverse-offload patch is (nearly) in:

On 07.12.22 09:08, Tobias Burnus wrote:


On 06.12.22 08:45, Tobias Burnus wrote:

* As follow-up,  libgomp.texi must be updated


Slight update to that uncommitted patch: I extended the nvptx entry to
state that only one reverse-offload region runs at a given time.

OK?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Reverse-offload updates

libgomp/
	* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
	(GCN): Add item about 'omp requires'.
	(nvptx): Likewise; add item about reverse offload.

 libgomp/libgomp.texi | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index b6c1ed714ce..f95e82fc8aa 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported.
   env variable @tab Y @tab
 @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
-  @tab complete but no non-host devices provides @code{unified_address},
-  @code{unified_shared_memory} or @code{reverse_offload}
+  @tab complete but no non-host devices provides @code{unified_address} or
+  @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab Y @tab
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
@@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{allocate} clause @tab P @tab Initial support
 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
 @item @code{ancestor} modifier on @code{device} clause
-  @tab Y @tab See comment for @code{requires}
+  @tab Y @tab Host fallback with GCN devices
 @item Implicit declare target directive @tab Y @tab
 @item Discontiguous array section with @code{target update} construct
   @tab N @tab
@@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported.
   @code{append_args} @tab N @tab
 @item @code{dispatch} construct @tab N @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
-@item @code{assume} directive @tab Y @tab
+@item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
 @item @code{error} directive @tab Y @tab
 @item @code{masked} construct @tab Y @tab
@@ -4456,6 +4456,9 @@ The implementation remark:
 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
   using the C library @code{printf} functions and the Fortran
   @code{print}/@code{write} statements.
+@item OpenMP code that has a requires directive with @code{unified_address},
+  @code{unified_shared_memory} or @code{reverse_offload} will remove
+  any GCN device from the list of available devices (``host fallback'').
 @end itemize
 
 
@@ -4507,6 +4510,15 @@ The implementation remark:
 @item Compilation OpenMP code that contains @code{requires reverse_offload}
   requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
   is not supported.
+@item For code containing reverse offload (i.e. @code{target} regions with
+  @code{device(ancestor:1)}), there is a slight performance penality
+  for @emph{all} target regions, consisting mostly of shutdown delay
+  Per device, reverse offload regions are processed serial such that
+  the next reverse offload region is only executed after the previous
+  one returns.
+@item OpenMP code that has a requires directive with @code{unified_address}
+  or @code{unified_shared_memory} will remove any nvptx device from the
+  list of available devices (``host fallback'').
 @end itemize
 
 


Re: [Patch] libgomp: Handle OpenMP's reverse offloads

2022-12-10 Thread Tobias Burnus

On 09.12.22 15:44, Jakub Jelinek wrote:

On Tue, Dec 06, 2022 at 08:45:07AM +0100, Tobias Burnus wrote:

[...]

I think we just shouldn't support libgomp plugins for 32-bit libgomp, only
host fallback.  If you want offloading, use 64-bit host...

(I concur.)



libgomp: Handle OpenMP's reverse offloads

+  /* Likeverse for the reverse lookup device->host for reverse offload. */

Likewise


+  reverse_splay_tree_node rev_array;

Do we need reverse_splay_tree* stuff in libgomp.h?
As splay_tree_node is just a pointer, perhaps just
struct reverse_splay_tree_node_s;
early and
   struct reverse_splay_tree_node_s *rev_array;
in libgomp.h and include the extra splay-tree.h only in target.c?
Unless one needs it anywhere else...


It is used as 'typedef struct reverse_splay_tree_node_s 
*reverse_splay_tree_node;' in

struct target_mem_desc {

  reverse_splay_tree_node rev_array;
}

but also as

struct gomp_device_descr
{
  ...
  struct reverse_splay_tree_s mem_map_rev;
}

The latter is

struct reverse_splay_tree_key_s {
  /* Address of the device object.  */
  uint64_t dev;
  splay_tree_key k;
};

which in turn needs 'splay_tree_key'.

Thus, I could either commit it as is – or turn the latter also
into a pointer and malloc it. Currently, it is accessed as
mem_map.k.root = NULL for init and later through the splay-tree
functions indirectly.

Thoughts?

Unless there are further comments, I will later commit it as is.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] Fortran: Replace simple '.' quotes by %<.%>

2022-12-09 Thread Tobias Burnus

Found when working on the just submitted/committed patch.

I intent to commit it to mainline as obvious tomorrow (or Sun or Mon),
unless there are comments.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Replace simple '.' quotes by %<.%>

When using %qs instead of '%s' or %<=%> instead of '=' looks nicer
by having nicer quotes and bold text, if the terminal supports it;
otherwise, plain quotes are used.

gcc/fortran/ChangeLog:

	* match.cc (gfc_match_member_sep): Use %<...%> in gfc_error.
	* openmp.cc (gfc_match_oacc_routine, gfc_match_omp_context_selector,
	gfc_match_omp_context_selector_specification,
	gfc_match_omp_declare_variant, resolve_omp_clauses): Likewise;
	use %qs instead of '%s'.
	* primary.cc (match_real_constant, gfc_match_varspec): Likewise.
	* resolve.cc (gfc_resolve_formal_arglist, resolve_operator,
	resolve_ordinary_assign): Likewise.

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 7ba0f349993..89fb115c0f6 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -195,3 +195,3 @@ gfc_match_member_sep(gfc_symbol *sym)
   gfc_error ("Expected structure component or operator name "
- "after '.' at %C");
+		 "after %<.%> at %C");
   goto error;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 4b4e6ac6947..7edc78ad0cb 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -4061,3 +4061,3 @@ gfc_match_oacc_routine (void)
 	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, expecting"
-		 " ')' after NAME");
+		 " %<)%> after NAME");
 	  gfc_current_locus = old_loc;
@@ -5350,4 +5350,4 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 		{
-		  gfc_error ("selector '%s' not allowed for context selector "
-			 "set '%s' at %C",
+		  gfc_error ("selector %qs not allowed for context selector "
+			 "set %qs at %C",
 			 selector, oss->trait_set_selector_name);
@@ -5370,3 +5370,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 	{
-	  gfc_error ("selector '%s' does not accept any properties at %C",
+	  gfc_error ("selector %qs does not accept any properties at %C",
 			 selector);
@@ -5379,3 +5379,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 		{
-		  gfc_error ("expected '(' at %C");
+		  gfc_error ("expected %<(%> at %C");
 		  return MATCH_ERROR;
@@ -5401,3 +5401,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 		{
-		  gfc_error ("expected ')' at %C");
+		  gfc_error ("expected %<)%> at %C");
 		  return MATCH_ERROR;
@@ -5514,3 +5514,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 	{
-	  gfc_error ("expected ')' at %C");
+	  gfc_error ("expected %<)%> at %C");
 	  return MATCH_ERROR;
@@ -5524,3 +5524,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 	{
-	  gfc_error ("expected '(' at %C");
+	  gfc_error ("expected %<(%> at %C");
 	  return MATCH_ERROR;
@@ -5570,4 +5570,4 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv)
 	{
-	  gfc_error ("expected 'construct', 'device', 'implementation' or "
-		 "'user' at %C");
+	  gfc_error ("expected %, %, % "
+		 "or % at %C");
 	  return MATCH_ERROR;
@@ -5578,3 +5578,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv)
 	{
-	  gfc_error ("expected '=' at %C");
+	  gfc_error ("expected %<=%> at %C");
 	  return MATCH_ERROR;
@@ -5585,3 +5585,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv)
 	{
-	  gfc_error ("expected '{' at %C");
+	  gfc_error ("expected %<{%> at %C");
 	  return MATCH_ERROR;
@@ -5600,3 +5600,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv)
 	{
-	  gfc_error ("expected '}' at %C");
+	  gfc_error ("expected %<}%> at %C");
 	  return MATCH_ERROR;
@@ -5622,3 +5622,3 @@ gfc_match_omp_declare_variant (void)
 {
-  gfc_error ("expected '(' at %C");
+  gfc_error ("expected %<(%> at %C");
   return MATCH_ERROR;
@@ -5670,3 +5670,3 @@ gfc_match_omp_declare_variant (void)
 {
-  gfc_error ("expected ')' at %C");
+  gfc_error ("expected %<)%> at %C");
   return MATCH_ERROR;
@@ -5680,3 +5680,3 @@ gfc_match_omp_declare_variant (void)
 	{
-	  gfc_error ("expected 'match' at %C");
+	  gfc_error ("expected % at %C");
 	  return MATCH_ERROR;
@@ -5689,3 +5689,3 @@ gfc_match_omp_declare_variant (void)
 	{
-	  gfc_error ("expected '(' at %C");
+	  gfc_error ("expected %<(%> at %C");
 	  return MATCH_ERROR;
@@ -5698,3 +5698,3 @@ gfc_match_omp_declare_variant (void)
 	{
-	  gfc_error ("expected ')' at %C");
+	  gfc_error ("expected %<)%> at %C");
 	  return MATCH_ERROR;
@@ -7380,3 +7380,3 @@ resolve_omp_clauses (gfc_code 

[Patch] Fortran/OpenMP: align/allocator modifiers to the allocate clause

2022-12-09 Thread Tobias Burnus

Implementing the 5.1 syntax inside the 'allocate' clause. That's a
fallout of working on something else...

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: align/allocator modifiers to the allocate clause

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
	output.
	* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
	(gfc_free_omp_namelist): Add bool arg.
	* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
	* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
	gfc_match_omp_flush): Update call.
	(gfc_match_omp_clauses): Match 'align/allocate modifers in
	'allocate' clause.
	(resolve_omp_clauses): Resolve align.
	* st.cc (gfc_free_statement): Update call
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.

libgomp/ChangeLog:

	* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
	item about 'align'; mark clause as 'Y' and directive as 'N'.
	* testsuite/libgomp.fortran/allocate-2.f90: New test.
	* testsuite/libgomp.fortran/allocate-3.f90: New test.

 gcc/fortran/dump-parse-tree.cc   |  23 +
 gcc/fortran/gfortran.h   |   3 +-
 gcc/fortran/match.cc |   4 +-
 gcc/fortran/openmp.cc| 106 +++
 gcc/fortran/st.cc|   2 +-
 gcc/fortran/trans-openmp.cc  |   8 ++
 libgomp/libgomp.texi |   4 +-
 libgomp/testsuite/libgomp.fortran/allocate-2.f90 |  25 ++
 libgomp/testsuite/libgomp.fortran/allocate-3.f90 |  28 ++
 9 files changed, 163 insertions(+), 40 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 2f042ab5142..5ae72dc1cac 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1357,6 +1357,29 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 	}
 	  ns_iter = n->u2.ns;
 	}
+  if (list_type == OMP_LIST_ALLOCATE)
+	{
+	  if (n->expr)
+	{
+	  fputs ("allocator(", dumpfile);
+	  show_expr (n->expr);
+	  fputc (')', dumpfile);
+	}
+	  if (n->expr && n->u.align)
+	fputc (',', dumpfile);
+	  if (n->u.align)
+	{
+	  fputs ("allocator(", dumpfile);
+	  show_expr (n->u.align);
+	  fputc (')', dumpfile);
+	}
+	  if (n->expr || n->u.align)
+	fputc (':', dumpfile);
+	  fputs (n->sym->name, dumpfile);
+	  if (n->next)
+	fputs (") ALLOCATE(", dumpfile);
+	  continue;
+	}
   if (list_type == OMP_LIST_REDUCTION)
 	switch (n->u.reduction_op)
 	  {
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index b541a07e2c7..5f8a81ae4a1 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1349,6 +1349,7 @@ typedef struct gfc_omp_namelist
   gfc_omp_reduction_op reduction_op;
   gfc_omp_depend_doacross_op depend_doacross_op;
   gfc_omp_map_op map_op;
+  gfc_expr *align;
   struct
 	{
 	  ENUM_BITFIELD (gfc_omp_linear_op) op:4;
@@ -3572,7 +3573,7 @@ void gfc_free_iterator (gfc_iterator *, int);
 void gfc_free_forall_iterator (gfc_forall_iterator *);
 void gfc_free_alloc_list (gfc_alloc *);
 void gfc_free_namelist (gfc_namelist *);
-void gfc_free_omp_namelist (gfc_omp_namelist *, bool);
+void gfc_free_omp_namelist (gfc_omp_namelist *, bool, bool);
 void gfc_free_equiv (gfc_equiv *);
 void gfc_free_equiv_until (gfc_equiv *, gfc_equiv *);
 void gfc_free_data (gfc_data *);
diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 8b8b6e79c8b..7ba0f349993 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -5524,13 +5524,15 @@ gfc_free_namelist (gfc_namelist *name)
 /* Free an OpenMP namelist structure.  */
 
 void
-gfc_free_omp_namelist (gfc_omp_namelist *name, bool free_ns)
+gfc_free_omp_namelist (gfc_omp_namelist *name, bool free_ns, bool free_align)
 {
   gfc_omp_namelist *n;
 
   for (; name; name = n)
 {
   gfc_free_expr (name->expr);
+  if (free_align)
+	gfc_free_expr (name->u.align);
   if (free_ns)
 	gfc_free_namespace (name->u2.ns);
   else if (name->u2.udr)
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 862c649b0b6..4b4e6ac6947 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -187,7 +187,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->vector_length_expr);
   for (i = 0; i < OMP_LIST_NUM; i++)
 gfc_free_omp_namelist (c->lists[i],
-			   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND);
+			   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND,
+			   i == OMP_LIST_ALLOCATE);
   gfc_free_expr_list (c->wait_list);
   gfc_free_expr_list (c->tile_list);
   free (CONST_CAST (char *, c->critical_name));
@@ -542,7 

Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Tobias Burnus

On 08.12.22 15:35, Andrew Stubbs wrote:

On 08/12/2022 14:02, Tobias Burnus wrote:

With available, I assume that nvptx is an 'available device' (per OpenMP
definition, finally added in TR11), i.e. there is an image for nvptx and
- after omp_requires filtering - there remains at least one nvptx
device.


If plugin-nvptx has been loaded then the function will be available.
Do we need to get fancier than that?


I think it does not really make sense to use CUDA if there is no single device.
In terms of loading, the code does:

gomp_target_init(void)
{
...
  cur = OFFLOAD_PLUGINS;  /* This is a comma-separated string with the 
supported plugins. */
...
if (gomp_load_plugin_for_device (_device, plugin_name))
  {
int omp_req = omp_requires_mask & ~GOMP_REQUIRES_TARGET_USED;
new_num_devs = current_device.get_num_devices_func (omp_req);

Thus, CUDA is loaded at the 'gomp_load_plugin_for_device' line and at the
'new_num_devs =' line, it has been filtered for OpenMP's 'requires' demands.*

Thus, 'new_num_devs' contains the number of 'accessible devices' (OpenMP 
definition),
filtered for the 'requires'* (which part of the 'supported devices' 
requirements).

(* With some caveats related to late loading of offloading code from (shared) 
libraries.)

 * * *

Admittedly, this does not yet cover the last suggested feature:

GOMP_offload_register_ver (...)
{
gomp_load_image_to_device (devicep, version,

which is relevant for the first part of:

'supported devices' - '... supported by the implementation for execution of 
target code ...
requires directive are fulfilled'.

(available = (intersection of 'accessible devices' and 'supported devices') 
possibly
filtered + reordered via the OMP_AVAILABLE_DEVICES env var.)


I am not sure how strictly it is required and when we know when the all 
offload_register are
over; I do note that OpenMP TR 11 has an over-engineered OMP_AVAILABLE_DEVICES 
environment
variable which permits to filter the list of available devices – which also 
requires early
access to the initial 'available devices' list. But it might be sufficient to 
rely on the
device-is-accessible + requires filtering and ignore whether an actual image is 
available.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Tobias Burnus

On 08.12.22 13:51, Andrew Stubbs wrote:

On 08/12/2022 12:11, Jakub Jelinek wrote:

On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc,
to ensure
that they can be unpinned safely when freed.

As I said before, I think the pinned memory is too precious to waste
it this
way, we should handle the -> pinned case through memkind_create_fixed on
mmap + mlock area, that way we can create even quite small pinned
allocations.


This has been delayed due to other priorities, but our current plan is
to switch to using cudaHostAlloc, when available, but we can certainly
use memkind_create_fixed for the fallback case (including amdgcn).


With available, I assume that nvptx is an 'available device' (per OpenMP
definition, finally added in TR11), i.e. there is an image for nvptx and
- after omp_requires filtering - there remains at least one nvptx device.

* * *

For completeness, I want to note that OpenMP TR11 adds support for
creating memory spaces that are accessible from multiple devices, e.g.
host + one/all devices, and adds some convenience functions for the
latter (all devices, host and a specific device etc.) →
https://openmp.org/specifications/ TR11 (see Appendix B.2 for the
release notes, esp. for Section 6.2).

I think it makes sense to keep those addition in mind when doing the
actual implementation to avoid incompatibilities.

Side note regarding ompx_ additions proposed in
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597979.html (adds
ompx_pinned_mem_alloc),
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597983.html
(ompx_unified_shared_mem_alloc and ompx_host_mem_alloc;
ompx_unified_shared_mem_space and ompx_host_mem_space).

While TR11 does not add any predefined allocators or new memory spaces,
using e.g. omp_get_devices_all_allocator(memspace) returns a
unified-shared-memory allocator.

I note that LLVM does not seem to have any ompx_ in this regard (yet?).
(It has some ompx_ – but related to assumptions.)



Using Cuda might be trickier to implement because there's a layering
violation inherent in routing target independent allocations through
the nvptx plugin, but benchmarking shows that that's the only way to
get the faster path through the Cuda black box; being pinned is good
because it avoids page faults, but apparently if Cuda *knows* it is
pinned then you get a speed boost even when there would be *no* faults
(i.e. on a quiet machine). Additionally, Cuda somehow ignores the
OS-defining limits.


I wonder whether for a NUMA machine (and non-offloading access), using
memkind_create_fixed will have an advantage over cuHostAlloc or not.
(BTW, I find cuHostAlloc vs. cuAllocHost confusing.) And if so, whether
we should provide a means (GOMP_... env var?) to toggle the preference.

My feeling is that, on most systems, it does not matter - except (a)
possibly for large NUMA systems, where the memkind tuning will probably
make a difference and (b) we know that CUDA's cu(HostAlloc/AllocHost) is
faster with nvptx offloading. (cu(HostAlloc/AllocHost) also permits DMA
from the device. (If unified-shared address is supported, but that's the
case [cf. comment + assert in plugin-nvptx.c].)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 2/2] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)

2022-12-08 Thread Tobias Burnus

Hi Julian:

On 07.12.22 20:13, Julian Brown wrote:

I know that this was the case before, but can you move the mark:1 etc.
after 'tlink'? In that case all bitfields are grouped together.

Thanks for doing so.

I wonder whether that also rejects the following – which seems to be
valid. The 'map' goes to 'target' and the 'firstprivate' to
'parallel', cf. OpenMP 5.2, "17.2 Clauses on Combined and Composite
Constructs", [340:3-4 & 12-14]. (BTW: While some fixes went into 5.1
regarding this section, a likewise wording is already in 5.0.)

(Testing showed: it give an ICE without the patch and an error with.)

...and this patch avoids the error for combined directives, and
reorders the gfc_symbol bitfields.


All in all, I am fine with the patch - but I spotted some issues.

First, I think you need to set for some error cases mark = 0 to avoid 
duplicated errors.
Namely:

  ! Outputs the error twice ('Symbol ‘y’ present on multiple clauses')
  !$omp target has_device_addr(y) firstprivate(y)
  block; end block

 * * *

Additionally, I think it would be good to have besides 'target' + 
map/firstprivate (→ error)
also a testcase for 'target simd' + map/firstprivate → error

And I think also gives-no-error checks all combined 'target ...' that take 
firstprivate
should be added, cf. your own patch - possibly with looking at the original 
dump (scan-tree-dump)
to see that the clause is properly attached correctly. Example for 'target 
teams':

  !$omp target teams map(x) firstprivate(x)
  block; end block

(Works but no testcase.)

 * * *

The following is not diagnosed and gives an ICE:

!$omp target in_reduction(+: x) private(x)
  block; end block
end

The C testcase properly has:
  error: ‘x’ appears more than once in data-sharing clauses

Note: Using 'firstprivate' instead of 'private' shows the proper error also in 
Fortran.


The following does not ICE but does not make sense (and is rejected in C):

4 | #pragma omp target private(x) map(x)

vs.

  !$omp target map(x) private(x)
  block; end block

(The latter produces "#pragma omp target private(x.0) map(tofrom:*x.0)", ups!)

 * * *

I also note that 'simd' accepts private such that

#pragma omp target simd private(x) map(x)
 for (int i=0; i < 0; i++)
 ;

!$omp target simd map(x) private(x)
do i = 1, 0; end do

is valid. (It is accepted by gcc and gfortran, i.e. it just needs to be added 
as testcase.)

 * * *

I note that C rejects {map(x),firstprivate(x)} + 
{has_device_addr(x),is_device_ptr(x)}',
but gfortran + your patch accepts:

  !$omp target map(x) has_device_addr(x)
  !$omp target map(x) is_device_ptr(x)

while

  !$omp target firstprivate(x) has_device_addr(x)
  !$omp target firstprivate(x) is_device_ptr(x)

is rejected – showing the error message twice.

Expected: I think it should show an error in all four cases - but only once.


2022-12-06  Julian Brown  

gcc/fortran/
 PR fortran/107214
 * gfortran.h (gfc_symbol): Add data_mark, dev_mark, gen_mark and
 reduc_mark bitfields.
 * openmp.cc (resolve_omp_clauses): Use above bitfields to improve
 duplicate clause detection.

gcc/testsuite/
 PR fortran/107214
 * gfortran.dg/gomp/pr107214.f90: New test.


Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 1/2] OpenMP/Fortran: Combined directives with map/firstprivate of same symbol

2022-12-08 Thread Tobias Burnus

On 07.12.22 20:09, Julian Brown wrote:

On Wed, 26 Oct 2022 12:39:39 +0200
Tobias Burnus  wrote:

The ICE seems to be because gcc/fortran/trans-openmp.cc's
gfc_split_omp_clauses mishandles this as the dump shows the following:

#pragma omp target firstprivate(a) map(tofrom:a)
  #pragma omp parallel firstprivate(a)

In contrast, for the C testcase:

#pragma omp target parallel for simd map(x) firstprivate(x)

the dump is as follows, which seems to be sensible:

#pragma omp target map(tofrom:x)
  #pragma omp parallel firstprivate(x)

This patch fixes a case where a combined directive (e.g. "!$omp target
parallel ...") contains both a map and a firstprivate clause for the
same variable.  When the combined directive is split into two nested
directives, the outer "target" gets the "map" clause, and the inner
"parallel" gets the "firstprivate" clause, like so:

...

This is not a recent regression, but appears to fix a long-standing ICE.

...

gcc/fortran/
 * trans-openmp.cc (gfc_add_firstprivate_if_unmapped): New function.
 (gfc_split_omp_clauses): Call above.

libgomp/
 * testsuite/libgomp.fortran/combined-directive-splitting-1.f90: New
 test.


LGTM – thanks!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v5 3/4] OpenMP: Pointers and member mappings

2022-12-07 Thread Tobias Burnus

Hi Julian,

I think this patch is OK; however, at least for gimplify.cc Jakub needs to have 
a second look.

As remarked for the 2/4 patch, I believe mapping 'map(tofrom: var%f(2:3))' 
should work
without explicitly mapping 'map(tofrom: var%f)'
(→ [TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0 320:22-27]).
→ https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html (+ 
previously in the thread).

Testing the patch, that seems to work fine (i.e. contrary to C/C++, cf. 2/4),
which matches the dump and, if I understood correctly, also your (Julian's) 
expectation.
Thus, no need to modify the code part.

Regarding the testcases:
* I would prefer if you don't modify the existing 
libgomp.fortran/struct-elem-map-1.f90 testcase;
  However, you could add your version as another variant ('subroutine nine()', 
'four_var()' or
  what's the next free name, possibly with a comment telling that it is 
'four()' but with an
  added explicit basepointer mapping.).

* As the new version should map *less*, I wonder whether some 
-fdump-tree-{original,gimple,omplower}
  scan-dump-tree checks would be useful besides testing whether it works at run 
time.
  (Your decision regarding which tree, which testcases and whether at all.)

* Likewise, maybe a 'target enter/exit data' check? However, you might very 
well run into my
  'omp target data exit' issue, cf. 
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html
  (needs to be revised based on Jakub's comments; I think those were on IRC 
only – the problem is that
  not only 'alloc' is affected but also 'from' etc.)

On 18.10.22 12:39, Julian Brown wrote:

Implementing the "omp declare mapper" functionality, I noticed some
cases where handling of derived type members that are pointers doesn't
seem to be quite right. At present, a type such as this:
...
   map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"):
...
   2) map(tofrom: tvar%arrptr(3:8)   -->
   GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)
   GOMP_MAP_TO_PSETtvar%arrptr
   GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data  (bias 3, etc.)

...
Additionally, the next patch in the series adds a runtime diagnostic
for the (illegal) case where 'i' and 'j' are different.

2022-10-18  Julian Brown  

gcc/fortran/
  * dependency.cc (gfc_omp_expr_prefix_same): New function.
  * dependency.h (gfc_omp_expr_prefix_same): Add prototype.
  * gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
  union.
  * trans-openmp.cc (dependency.h): Include.
  (gfc_trans_omp_array_section): Use GOMP_MAP_TO_PSET unconditionally for
  mapping array descriptors.
  (gfc_symbol_rooted_namelist): New function.
  (gfc_trans_omp_clauses): Check subcomponent and subarray/element
  accesses elsewhere in the clause list for pointers to derived types or
  array descriptors, and adjust or drop mapping nodes appropriately.

gcc/
  * gimplify.cc (omp_tsort_mapping_groups): Process nodes that have
  OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't.
  (omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
  Remove GOMP_MAP_ALWAYS_POINTER handling.

libgomp/
  * testsuite/libgomp.fortran/map-subarray.f90: New test.
  * testsuite/libgomp.fortran/map-subarray-2.f90: New test.
  * testsuite/libgomp.fortran/map-subarray-3.f90: New test.
  * testsuite/libgomp.fortran/map-subarray-4.f90: New test.
  * testsuite/libgomp.fortran/map-subarray-6.f90: New test.
  * testsuite/libgomp.fortran/map-subarray-7.f90: New test.
  * testsuite/libgomp.fortran/map-subcomponents.f90: New test.
  * testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
  descriptor-mapping changes.  Remove XFAIL.

...

--- a/libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90
@@ -229,7 +229,8 @@ contains

  !   !$omp target map(tofrom: var%d(4:7), var%f(2:3), var%str2(2:3)) &
  !   !$omp&   map(tofrom: var%str4(2:2), var%uni2(2:3), var%uni4(2:2))
-!$omp target map(tofrom: var%d(4:7), var%f(2:3), var%str2(2:3), 
var%uni2(2:3))
+!$omp target map(to: var%f) map(tofrom: var%d(4:7), var%f(2:3), &
+!$omp&   var%str2(2:3), var%uni2(2:3))

This adds 'to: var%f'  (to the existing 'var%f(2:3)') – where 'f' is a
POINTER. As discussed at the top, I prefer to leave it as is – and
possibly just add another test-function, replicating this function and
only there adding the basepointer as additional list item.

-!$omp target map(tofrom: var%f(2:3))
+!$omp target map(to: var%f) map(tofrom: var%f(2:3))

likewise.

-!$omp target map(tofrom: var%d(5), var%f(3), var%str2(3), var%uni2(3))
+!$omp target map(to: var%f) map(tofrom: var%d(5), var%f(3), &
+!$omp&  var%str2(3), var%uni2(3))

likewise.

-!$omp target map(tofrom: 

Re: [PATCH v5 2/4] OpenMP/OpenACC: Rework clause expansion and nested struct handling

2022-12-07 Thread Tobias Burnus

Hi Julian,

On 07.12.22 16:16, Julian Brown wrote:

On Wed, 7 Dec 2022 15:54:42 +0100 Tobias Burnus  wrote:

If I understand Deepak's comment (on OpenMP.org's omp-lang list, sorry
it is a nonpublic list) correctly, the following wording implies that
a 'from: s.w[z:4]' for a pointer 's.w' also implies a mapping of
's.w' - if 's' is used inside the target region and, thus, gets
implicitly mapped.

[TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0
320:22-27])

"If a list item with an implicit data-mapping attribute does not have
any corresponding storage in the device data environment prior to a
task encountering the construct associated with the map clause, and
one or more contiguous parts of the original storage are either list
items or base pointers to list items that are explicitly mapped on
the construct, only those parts of the original storage will have
corresponding storage in the device data environment as a result of
the map clauses on the construct."

Hmmm... IIRC that is a different conclusion than the one we have
understood previously, leading to e.g. the patch here (Chung-Lin CC'ed):

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html


This seems to be the "Target directive struct mapping question" omp-lang thread,
started on 2021-03-22.

I think we need to distinguish:

  #pragma omp target enter data map(to: s.w[:10])

from

  #pragma omp target map(tofrom: s.arr[:20])
s.arr[0] = 5;

As in the latter case 's' gets implicitly mapped and then applies to
the base pointer 's.arr' of 's.arr[:20]'. While in the former case,
only the pointee gets mapped without the pointer 's.arr' (and, hence,
there is also no pointer attachment).

At least that's what I get from the wording above and reading Deepak's last
email - and it does not seem to clash with the discussion in the lengthy
omp-lang thread. (Maybe there are other threads – or I completely misread them.)

I think it makes sense to have a clarifying example in OpenMP; hence,
I filed the OpenMP.org example issue #342, starting with essentially
what I wrote above: 'target enter data' needs more work to get the pointer
handling done, 'target' + accessing 's' works as is.

I hope it makes sense.


Follow-on discussion then questioned whether the change was really the
intention of the spec, but we thought it was.  Has that changed now?


No idea – I find it difficult to track all the language changes and find
mapping complex and unclear.

However, it does seem to make sense in the way written above without
contradicting to all previous discussions, minus the common confusion.
(As least as I gathered from browsing both omp-lang and gcc-patches.)


(I think actually changing the behaviour is a matter of flipping a
switch, but let's make sure we choose the right setting!)


That sounds great!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v5 2/4] OpenMP/OpenACC: Rework clause expansion and nested struct handling

2022-12-07 Thread Tobias Burnus

Hi Julian,

If I understand Deepak's comment (on OpenMP.org's omp-lang list, sorry
it is a nonpublic list) correctly, the following wording implies that a
'from: s.w[z:4]' for a pointer 's.w' also implies a mapping of 's.w' -
if 's' is used inside the target region and, thus, gets implicitly mapped.

[TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0 320:22-27])

"If a list item with an implicit data-mapping attribute does not have any 
corresponding storage in the device data environment prior to a task encountering the 
construct associated with the map clause, and one or more contiguous parts of the 
original storage are either list items or base pointers to list items that are explicitly 
mapped on the construct, only those parts of the original storage will have corresponding 
storage in the device data environment as a result of the map clauses on the 
construct."

Thus, the following change should not be required – but if I undo it, I see a 
libgomp runtime error. Hence, it looks as if you need to fix this:

On 18.10.22 12:39, Julian Brown wrote:

--- a/libgomp/testsuite/libgomp.c/target-22.c
+++ b/libgomp/testsuite/libgomp.c/target-22.c
@@ -21,7 +21,8 @@ main ()
s.v.b = a + 16;
s.w = c + 3;
int err = 0;
-  #pragma omp target map (to:s.v.b[0:z + 7], s.u[z + 1:z + 4]) \
+  #pragma omp target map (to: s.w, s.v.b, s.u, s.s) \
+  map (to:s.v.b[0:z + 7], s.u[z + 1:z + 4]) \
   map (tofrom:s.s[3:3]) \
   map (from: s.w[z:4], err) private (i)


Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads)

2022-12-07 Thread Tobias Burnus

On 06.12.22 08:45, Tobias Burnus wrote:

* As follow-up,  libgomp.texi must be updated


That is what the attached patch does – obviously, it is depending on the
main patch.

OK (once the main patch is in)?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Reverse-offload updates

libgomp/
	* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
	(GCN): Add item about 'omp requires'.
	(nvptx): Likewise; add item about reverse offload.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index efa7d956a33..e9ab079ecf5 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported.
   env variable @tab Y @tab
 @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
-  @tab complete but no non-host devices provides @code{unified_address},
-  @code{unified_shared_memory} or @code{reverse_offload}
+  @tab complete but no non-host devices provides @code{unified_address} or
+  @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab Y @tab
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
@@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{allocate} clause @tab P @tab Initial support
 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
 @item @code{ancestor} modifier on @code{device} clause
-  @tab Y @tab See comment for @code{requires}
+  @tab Y @tab Host fallback with GCN devices
 @item Implicit declare target directive @tab Y @tab
 @item Discontiguous array section with @code{target update} construct
   @tab N @tab
@@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported.
   @code{append_args} @tab N @tab
 @item @code{dispatch} construct @tab N @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
-@item @code{assume} directive @tab Y @tab
+@item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
 @item @code{error} directive @tab Y @tab
 @item @code{masked} construct @tab Y @tab
@@ -4455,6 +4455,9 @@ The implementation remark:
 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
   using the C library @code{printf} functions and the Fortran
   @code{print}/@code{write} statements.
+@item OpenMP code that has a requires directive with @code{unified_address},
+  @code{unified_shared_memory} or @code{reverse_offload} will remove
+  any GCN device from the list of available devices (``host fallback'').
 @end itemize
 
 
@@ -4504,6 +4507,13 @@ The implementation remark:
 @item Compilation OpenMP code that contains @code{requires reverse_offload}
   requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
   is not supported.
+@item For code containing reverse offload (i.e. @code{target} regions with
+  @code{device(ancestor:1)}), there is a slight performance penality
+  for @emph{all} target regions, consisting mostly of shutdown delay
+  between zero to one microsecond and a tiny device querying overhead.
+@item OpenMP code that has a requires directive with @code{unified_address}
+  or @code{unified_shared_memory} will remove any nvptx device from the
+  list of available devices (``host fallback'').
 @end itemize
 
 



Re: [wwwdocs] gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update

2022-12-06 Thread Tobias Burnus

On 06.12.22 10:15, Jakub Jelinek wrote:

On Tue, Dec 06, 2022 at 09:59:17AM +0100, Tobias Burnus wrote:

This patch updates the OpenMP implementation status, based on libgomp.texi.
For the release notes, it also moves 'non-rectangular loop nests' up as that's
a 5.0 not a 5.1 feature.
And in line with libgomp.texi, it adds to projects/gomp/ the items for TR11,
a OpenMP 6.0 preview. (Hence, the id="omp6.0" to have a fixed id even when
the list is updated to TR12 and later OpenMP 6.0.)

The posted patch is certainly good, but doesn't do what you wrote above.


Next try  – how about this one?

Tobias


PS: There will be surely more updates before GCC 13 is released; I hope/assume
the next change will be for nvptx reverse offload...

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update

 htdocs/gcc-13/changes.html  |  21 ++--
 htdocs/projects/gomp/index.html | 227 
 2 files changed, 223 insertions(+), 25 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 689178f9..59cb7a8d 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -46,14 +46,15 @@ a work-in-progress.
 General Improvements
 
 
-  https://gcc.gnu.org/projects/gomp/;>OpenMP
+  https://gcc.gnu.org/projects/gomp/;>OpenMP
   
 
   Reverse offload is now supported and the all clauses to the
   requires directive are now accepted; however, the
   requires_offload, unified_address
   and unified_shared_memory clauses cause that the
-  only available device is the initial device (the host).
+  only available device is the initial device (the host). Fortran now
+  supports non-rectangular loop nests, which were added for C/C++ in GCC 11.
 
 
   The following OpenMP 5.1 features have been added: the
@@ -62,9 +63,10 @@ a work-in-progress.
   clause for the taskwait directive and the
   omp_target_is_accessible, omp_target_memcpy_async,
   omp_target_memcpy_rect_async and
-  omp_get_mapped_ptr API routines. Fortran now supports
-  non-rectangular loop nests, which were added for C/C++ in GCC 11.
-
+  omp_get_mapped_ptr API routines. The assume and assumes
+  directives, the begin/end declare target syntax in C/C++
+  and device-specific ICV settings with environment variables are now
+  supported.
 
   Initial support for OpenMP 5.2 features have been added: Support for
   firstprivate and allocate clauses on the
@@ -73,7 +75,14 @@ a work-in-progress.
   omp_initial_device and omp_invalid_device; and
   optionally omitting the map-type in target enter/exit data.
   The enter clause (as alias for to) has been added
-  to the declare target directive.
+  to the declare target directive. Also added has been the
+  omp_in_explicit_task routine and the doacross
+  clause as alias for depend with source/sink
+  modifier.
+
+
+  The _ALL suffix to the device-scope environment variables,
+  added in Technical Report (TR11) is already handled.
 
 
   For user defined allocators requesting high bandwidth or large capacity
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 87903289..114bcde6 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -28,7 +28,8 @@ OpenMP and OpenACC are supported with GCC's C, C++ and Fortran compilers.
   2.5 · 3.0 ·
   3.1 · 4.0 ·
   4.5 · 5.0 ·
-  5.1 · 5.2
+  5.1 · 5.2 ·
+  TR 11
   OpenMP Releases and Status
 
 
@@ -620,6 +621,16 @@ than listed, depending on resolved corner cases and optimizations.
 GCC12
 
   
+  
+device-specific ICV settings with environment variables
+GCC13
+
+  
+  
+assume directive
+GCC13
+
+  
   
 inoutset argument to the depend clause
 GCC13
@@ -650,6 +661,11 @@ than listed, depending on resolved corner cases and optimizations.
 GCC13
 
   
+  
+Support begin/end declare target syntax in C/C++
+GCC13
+
+  
   
 target_device trait in OpenMP Context
 No
@@ -675,16 +691,6 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
-  
-device-specific ICV settings with environment variables
-GCC13
-
-  
-  
-assume directive
-No
-
-  
   
 Loop transformation constructs
 No
@@ -727,27 +733,28 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-ompt_sync_region_t enum additions
+For Fortran, diagnose placing declarative before/between USE,
+  IMPORT, and IMPLICIT as invalid
 No
 
   
   
-ompt_state_t enum: ompt_state_wait_barrie

[wwwdocs] gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update

2022-12-06 Thread Tobias Burnus

This patch updates the OpenMP implementation status, based on libgomp.texi.
For the release notes, it also moves 'non-rectangular loop nests' up as that's
a 5.0 not a 5.1 feature.
And in line with libgomp.texi, it adds to projects/gomp/ the items for TR11,
a OpenMP 6.0 preview. (Hence, the id="omp6.0" to have a fixed id even when
the list is updated to TR12 and later OpenMP 6.0.)

Comments? Suggestions? OK?

Tobias

PS: There will be surely more updates before GCC 13 is released; I hope/assume
the next change will be for nvptx reverse offload...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 9f80367e539839fff1df2c85fc2640638199fc9e
Author: Tobias Burnus 
Date:   Tue Dec 6 09:49:30 2022 +0100

libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item

libgomp/
* libgomp.texi (OpenMP 5.2): Add missing 'the'.
(TR11): Add missing '@tab N @tab'.
---
 libgomp/libgomp.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4caac497506..efa7d956a33 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -406,7 +406,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item @code{allocate} and @code{firstprivate} clauses on @code{scope}
   @tab Y @tab
 @item @code{ompt_callback_work} @tab N @tab
-@item Default map-type for @code{map} clause in @code{target enter/exit data}
+@item Default map-type for the @code{map} clause in @code{target enter/exit data}
   @tab Y @tab
 @item New @code{doacross} clause as alias for @code{depend} with
   @code{source}/@code{sink} modifier @tab Y @tab
@@ -463,6 +463,7 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0.
 @item @code{access} allocator trait changes @tab N @tab
 @item Extension of @code{interop} operation of @code{append_args}, allowing all
   modifiers of the @code{init} clause
+  @tab N @tab
 @item @code{interop} clause to @code{dispatch} @tab N @tab
 @item @code{apply} code to loop-transforming constructs @tab N @tab
 @item @code{omp_curr_progress_width} identifier @tab N @tab


[committed] libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item

2022-12-06 Thread Tobias Burnus

Found when updating the wwwdocs files.

Committed as obvious as https://gcc.gnu.org/r13-4500

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 9f80367e539839fff1df2c85fc2640638199fc9e
Author: Tobias Burnus 
Date:   Tue Dec 6 09:49:30 2022 +0100

libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item

libgomp/
* libgomp.texi (OpenMP 5.2): Add missing 'the'.
(TR11): Add missing '@tab N @tab'.
---
 libgomp/libgomp.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4caac497506..efa7d956a33 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -406,7 +406,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item @code{allocate} and @code{firstprivate} clauses on @code{scope}
   @tab Y @tab
 @item @code{ompt_callback_work} @tab N @tab
-@item Default map-type for @code{map} clause in @code{target enter/exit data}
+@item Default map-type for the @code{map} clause in @code{target enter/exit data}
   @tab Y @tab
 @item New @code{doacross} clause as alias for @code{depend} with
   @code{source}/@code{sink} modifier @tab Y @tab
@@ -463,6 +463,7 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0.
 @item @code{access} allocator trait changes @tab N @tab
 @item Extension of @code{interop} operation of @code{append_args}, allowing all
   modifiers of the @code{init} clause
+  @tab N @tab
 @item @code{interop} clause to @code{dispatch} @tab N @tab
 @item @code{apply} code to loop-transforming constructs @tab N @tab
 @item @code{omp_curr_progress_width} identifier @tab N @tab


[Patch] libgomp: Handle OpenMP's reverse offloads

2022-12-05 Thread Tobias Burnus

This patch finally handles reverse offload. Due to the prep work,
it essentially only adds content to libgomp/target.c's gomp_target_rev(),
except that it additionally saves the reverse-offload-function table
in gomp_load_image_to_device.

In the comment to "[Patch] libgomp: Add reverse-offload splay tree",
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601368.html ,
it was suggested not to keep track of all the variable mappings and
to reconstruct the mapping from the normal splay tree, which this
patch does.
(Albeit in the very slow walk-everything way. Given that reverse-offload
target regions likely have only few map items and program should only use
few reverse-offload regions and expect them not being fast, that might
be okay.)

Specification references:
- For pointer attachment, I assume that the pointer is already fine on
  the host (if existed on the host before) and it does not need to get
  updated. I think the spec lacks a wording for this; cf. OpenMP Spec Issue 
#3424.
- There are plans to permit 'nowait'. I think it wouldn't change anything
  except for not spin waiting for the result - and (only for shared memory),
  the argument lists (addr, kinds, sizes) need to be copied to have a sufficent
  life time. (To be implemented in future; cf. OpenMP Spec Pull Req. 3423
  for Issue 2038.)

 * * *

32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however,
nvptx and gcn requires -m64 on the device side and assume that the device
pointers are representable on the host (i.e. all are 64bit). The new code
tries to be in principle compatible with uint32_t pointers and uses uint64_t
to represent it consistently. – The code should be mostly fine, except that
one called function requires an array of void* and size_t. Instead of handling
that case, I added some code to permit optimizing away the function content
without offloading - and a run-time assert if it should ever happen that this
function gets called on a 32bit host from the target side.
It is a run-time fail as '#if TARGET_OFFLOAD == ""' does not work (string
comparison by the C preprocessor not supported, unfortunately).

Comments, suggestions, OK for mainline, ... ?

Tobias

PS:
* As follow-up,  libgomp.texi must be updated
* For GCN, it currently does not work until stack variables are accessible
  from the host. (Prep work for this is in newlib + GCC 13.) One done, a
  similar one-line change to plugin-gcn.c's GOMP_OFFLOAD_get_num_devices is
  required.

PPS: (Off topic remark to 32bit host)
While 32bit host with 32bit device will mostly work, having a 32bit host
with a 64bit device becomes interesting as 'void *' returned by 
omp_target_alloc(...)
can't represent a device pointer. The solution is a 32bit pointer pointing to a 
64bit
valirable, e.g.
  uint64_t *devptr = malloc(sizeof(uint64_t*);
  *devptr = internal_device_alloc ();
  return devptr;
with all the fun to translate this correctly with {use,has}_device_ptr etc.

To actually support this will require some larger changes to libgomp, which I
do not see happening unless a device system with sizeof(void*) > 64 bit shows
up. Or some compelling reason to use 32bit on the host; but not for for x86-64 
or arm64
(or PowerPC). (There exist 128bit pointer systems, which use the upper bits for 
extra
purposes - but for unified-shared address purposes, it seems to be unlikely that
accelerator devices head this direction.)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Handle OpenMP's reverse offloads

This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called.  And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this 

Re: [Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)

2022-11-30 Thread Tobias Burnus



On 30.11.22 10:43, Andrew Stubbs wrote:

On 29/11/2022 18:26, Tobias Burnus wrote:

On 29.11.22 16:56, Paul-Antoine Arras wrote:

This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, [...]

PA committed that patch as
https://gcc.gnu.org/r13-4403-g1fd508744eccda9ad9c6d6fcce5b2ea9c568818d
(thanks!)

I think this should be documented somewhere. We have
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html

The wording is a little odd.
How about "Additionally, gfx908 is supported as an alias for fiji"?


Committed with the suggested wording:
https://gcc.gnu.org/r13-4404-ge0b95c2e8b771b53876321a6a0a9497619af73cd

Thanks,

Tobias

PS: It does not help with finding a good wording if that's the last task
before calling it a day...

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)

2022-11-29 Thread Tobias Burnus

Hi PA, hi Andrew, hi Jakub, hi all,

On 29.11.22 16:56, Paul-Antoine Arras wrote:

This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, [...]


I think this should be documented somewhere. We have
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html

For GCN and ISA, it refers to -march= and gfx803 is only a context
selector. Hence:

How about the attached patch?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add 'gfx803' to gcn's isa.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 85cae742cd4..0066d41fdc5 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4378,5 +4378,6 @@ offloading devices (it's not clear if they should be):
 @item @code{amdgcn}, @code{gcn}
   @tab @code{gpu}
-  @tab See @code{-march=} in ``AMD GCN Options''
+  @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally
+  supported is @code{gfx803} as an alias for @code{fiji}.}
 @item @code{nvptx}
   @tab @code{gpu}


[Patch] gcn: Fix __builtin_gcn_first_call_this_thread_p

2022-11-27 Thread Tobias Burnus

It turned out that cprop cleverly propagated the unspec_volatile
to the preceding (pseudo)register, permitting to remove the
'set (s0) (pseudoregister)' at -O2.  Unfortunately, it does
matter whether the assignment is done to 's2' (previously: pseudoregister)
or to s1. – Just having a hard register is not enough ...

Solution: Use USE (alias gen_rtx_USE) instead.

Additionally, I removed the s0 modification (that should lead to the unchanged 
result)
by adding 'gcn_operand_part (DImode, reg, 1)' and then working with SImode. 
Result:

  if (__builtin_gcn_first_call_this_thread_p())
x = 42;

becomes now (with -O2) the following; the builtin code is up to to (and 
including)
'.L2', the rest is the 'if' and 'x=42':

s_lshr_b32  s2, s1, 16
s_cmpk_lg_u32   s2, 12345
s_mov_b32   s12, scc
s_mov_b32   vcc_lo, scc
s_mov_b32   vcc_hi, 0
s_cbranch_vccz  .L2
s_and_b32   s2, s1, 65535   (= 0x)
s_or_b32s1, s2, 809041920 (= 0x3039 = (12345 << 16))
.L2:
s_getpc_b64 s[2:3]
s_add_u32   s2, s2, x@rel32@lo+4
s_addc_u32  s3, s3, x@rel32@hi+4
s_mov_b32   vcc_lo, s12
s_mov_b32   vcc_hi, 0
s_cbranch_vccz  .L3
s_mov_b32   s12, 42
v_writelane_b32 v0, s12, 0
s_mov_b64   exec, 1
global_store_dword  v1, v0, s[2:3]
.L3:


OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Fix __builtin_gcn_first_call_this_thread_p

Contrary naive expectation, unspec_volatile (via prologue_use) did not
prevent the cprop pass (at -O2) to remove the access to the s[0:1]
(PRIVATE_SEGMENT_BUFFER_ARG) register as the volatile got just put on
the preceeding pseudoregister.  Solution: Use gen_rtx_USE instead.
Additionally, this patch removes (gen_)prologue_use_di as it is then no
longer used.

Finally, as we already do bit manipulation, instead of using the full
64bit side - and then just keeping the value of 's0', just move directly
to use only s1 of s[0:1] and do the bit manipulations there, generating
more readable assembly code and better matching the '#else' branch.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_expand_builtin_1): Work on s1 instead
	of s[0:1] and use USE to prevent removal of setting that register.
	* config/gcn/gcn.md (prologue_use_di): Remove.

 gcc/config/gcn/gcn.cc | 16 
 gcc/config/gcn/gcn.md | 13 -
 2 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6fb261318c4..c74fa007a21 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4556,8 +4556,9 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	rtx not_first = gen_label_rtx ();
 	rtx reg = gen_rtx_REG (DImode,
 			cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
-	rtx cmp = force_reg (DImode,
- gen_rtx_LSHIFTRT (DImode, reg, GEN_INT (48)));
+	reg = gcn_operand_part (DImode, reg, 1);
+	rtx cmp = force_reg (SImode,
+ gen_rtx_LSHIFTRT (SImode, reg, GEN_INT (16)));
 	emit_insn (gen_cstoresi4 (result, gen_rtx_NE (BImode, cmp,
 			  GEN_INT(12345)),
   cmp, GEN_INT(12345)));
@@ -4565,12 +4566,11 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 			  const0_rtx),
    result));
 	emit_move_insn (reg,
-	  force_reg (DImode,
-		gen_rtx_IOR (DImode,
-			 gen_rtx_AND (DImode, reg,
-	  GEN_INT (0xL)),
-			 GEN_INT (12345L << 48;
-	emit_insn (gen_prologue_use (reg));
+	  force_reg (SImode,
+		gen_rtx_IOR (SImode,
+			 gen_rtx_AND (SImode, reg, GEN_INT (0x)),
+			 GEN_INT (12345L << 16;
+	emit_insn (gen_rtx_USE (VOIDmode, reg));
 	emit_label (not_first);
 	  }
 	return result;
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index a8b9c28d115..92e9892c4f7 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -697,19 +697,6 @@
   ""
   [(set_attr "length" "0")])
 
-(define_insn_and_split "prologue_use_di"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand")] UNSPECV_PROLOGUE_USE)]
-  ""
-  "#"
-  "reload_completed"
-  [(unspec_volatile [(match_dup 0)] UNSPECV_PROLOGUE_USE)
-   (unspec_volatile [(match_dup 1)] UNSPECV_PROLOGUE_USE)]
-  {
-operands[1] = gcn_operand_part (DImode, operands[0], 1);
-operands[0] = gcn_operand_part (DImode, operands[0], 0);
-  }
-  [(set_attr "length" "0")])
-
 (define_expand "prologue"
   [(const_int 0)]
   ""


Re: [Patch] OpenMP/Fortran: Permit end-clause on directive

2022-11-27 Thread Tobias Burnus

Updated patch – taking the comments below into account – and the remark
by Harald, second by Jakub. Namely:

I have now split the pre-existing nowait-2.f90 into nowait-2.f90 (with
only valid usage) and nowait-4.f90 (with the dg-error tests). In the
previous version of the patch, nowait-4.f90 was a variant of
nowait-2.f90 that used 'nowait' on the directive line. - And Harald
suggested to split the latter, which I now did – into nowait-{5,6}.f90.

Cf. Harald's email at
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600539.html and
two emails by Jakub ("Otherwise LGTM"), first at
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601304.html +
the next email in the thread.

I intent to commit the attached patch tomorrow, unless there are further
comments.

Thanks for the reviews (and I know that the follow up is very belated)!

Tobias


On 08.09.22 17:21, Jakub Jelinek via Fortran wrote:

On Fri, Aug 26, 2022 at 08:21:26PM +0200, Tobias Burnus wrote:

I did run into some issues related to this; those turned out to be
unrelated, but I end ended up implementing this feature.

Side remark: 'omp parallel workshare' seems to actually permit 'nowait'
now, but I guess that's an unintended change due to the
syntax-representation change. Hence, it is now tracked as Spec Issue
3338 and I do not permit it.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Permit end-clause on directive

gcc/fortran/ChangeLog:

 * openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES,
 OMP_SECTIONS_CLAUSES, OMP_SINGLE_CLAUSES): Add 'nowait'.

This doesn't describe what the patch actually does, Add 'nowait'.
is only true for the first 3, for OMP_SINGLE_CLAUSES IMHO you
want a separate
  (OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'.
entry.


@@ -3855,7 +3857,7 @@ cleanup:
 | OMP_CLAUSE_ORDER | OMP_CLAUSE_ALLOCATE)
  #define OMP_SINGLE_CLAUSES \
(omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE \
-   | OMP_CLAUSE_ALLOCATE)
+   | OMP_CLAUSE_ALLOCATE | OMP_CLAUSE_NOWAIT | OMP_CLAUSE_COPYPRIVATE)
  #define OMP_ORDERED_CLAUSES \
(omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD)
  #define OMP_DECLARE_TARGET_CLAUSES \
@@ -5909,13 +5915,11 @@ gfc_match_omp_teams_distribute_simd (void)
  match
  gfc_match_omp_workshare (void)
  {
-  if (gfc_match_omp_eos () != MATCH_YES)
-{
-  gfc_error ("Unexpected junk after $OMP WORKSHARE statement at %C");
-  return MATCH_ERROR;
-}
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (, omp_mask (OMP_CLAUSE_NOWAIT)) != MATCH_YES)
+return MATCH_ERROR;
new_st.op = EXEC_OMP_WORKSHARE;
-  new_st.ext.omp_clauses = gfc_get_omp_clauses ();
+  new_st.ext.omp_clauses = c;
return MATCH_YES;
  }

I think it would be better to introduce OMP_WORKSHARE_CLAUSES and use
it in both gfc_match_omp_workshare and just use
   return match_omp (EXEC_OMP_WORKSHARE, OMP_WORKSHARE_CLAUSES);
?


@@ -6954,6 +6952,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   }
 break;
   case OMP_LIST_COPYPRIVATE:
+if (omp_clauses->nowait)
+  gfc_error ("NOWAIT clause must not be be used with COPYPRIVATE "

s/be be/be/

+ "clause at %L", >where);
 for (; n != NULL; n = n->next)
   {
 if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
@@ -5284,7 +5285,13 @@ parse_omp_do (gfc_statement omp_st)
if (st == omp_end_st)
  {
if (new_st.op == EXEC_OMP_END_NOWAIT)
-cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool;
+{
+  if (cp->ext.omp_clauses->nowait && new_st.ext.omp_bool)
+gfc_error_now ("Duplicated NOWAIT clause on %s and %s at %C",
+   gfc_ascii_statement (omp_st),
+   gfc_ascii_statement (omp_end_st));
+  cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool;
+}
else
 gcc_assert (new_st.op == EXEC_NOP);
gfc_clear_new_st ();

Not sure if the standard is clear enough that unique clauses can't be
repeated on both directive and corresponding end directive.  But let's
assume that is the case.


--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/copyprivate-2.f90
@@ -0,0 +1,69 @@
+  FUNCTION t()
+INTEGER :: a, b, t
+a = 0
+t = b
+b = 0
+!$OMP PARALLEL REDUCTION(+:b)
+  !$OMP SINGLE COPYPRIVATE (b) NOWAIT  ! { dg-error "NOWAIT clause must not be 
be used with COPYPRIVATE clause" }

Here too (several times).


+!$OMP ATOMIC WRITE
+b = 6
+  !$OMP END SINGLE
+!$OMP END PARALLEL
+t = t + b
+  END FUNCTION
+
+  FUNCTION t2()
+INTEGER :: a, b, t2
+a 

Re: [Patch] libgomp.texi: OpenMP Impl Status 5.1 additions + TR11

2022-11-25 Thread Tobias Burnus

On 25.11.22 11:38, Jakub Jelinek wrote:

On Fri, Nov 25, 2022 at 11:34:35AM +0100, Tobias Burnus wrote:

It also adds TR11. I don't think we will work any time soon
on TR11 – possibly except for clarifications.

OK for mainline?

Ok (but I hope that once 6.0 is out, we just keep OpenMP 6.0 entries
and don't mention any TRs).


Yes, that was the idea to update it to/for TR12 next year and to/for 6.0
in two years. That also matches the spec itself, which gets replaced by
newer TR and then by the final spec, keeping the old TR only in some
more hidden links on the spec page.

Pushed as
https://gcc.gnu.org/r13-4301-gc16e85d726a7793c05209af031dac0bebf035ab9

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] libgomp.texi: OpenMP Impl Status 5.1 additions + TR11

2022-11-25 Thread Tobias Burnus

Update libgomp.texi's OpenMP implementation status.
The 5.1 changes are taken from Jakub's comment at
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602639.html
(sorry for taking that long to incorporate those).

It also adds TR11. I don't think we will work any time soon
on TR11 – possibly except for clarifications.

OK for mainline?

Tobias

PS: Albeit sometimes there is a fine border between clarification and
larger new feature. For instance,
* implicitly declared reduction identifiers for arbitrary C++ classes - or
* how to handle implicit 'declare target' with declare variant and (no)host
  selectors.
(The TR11 wording implies that the former is an old feature, while the latter
is implied by the OpenMP 5.2 examples document, albeit an issue to clarify
this in TR12 exists. For the latter: https://gcc.gnu.org/PR106316 + OpenMP Spec 
Issue 3416.)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: OpenMP Impl Status 5.1 additions + TR11

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Implementation Status): Add three 5.1 items
	and status for Technical Report (TR) 11.

 libgomp/libgomp.texi | 68 
 1 file changed, 68 insertions(+)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 10fefa97922..584af45bd67 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -162,6 +162,7 @@ See also @ref{OpenMP Implementation Status}.
 * OpenMP 5.0:: Feature completion status to 5.0 specification
 * OpenMP 5.1:: Feature completion status to 5.1 specification
 * OpenMP 5.2:: Feature completion status to 5.2 specification
+* OpenMP Technical Report 11:: Feature completion status to first 6.0 preview
 @end menu
 
 The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
@@ -350,6 +351,9 @@ The OpenMP 4.5 specification is fully supported.
 to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item For Fortran, diagnose placing declarative before/between @code{USE},
   @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
+@item Optional comma beween directive and clause in the @code{#pragma} form @tab Y @tab
+@item @code{indirect} clause in @code{declare target} @tab N @tab
+@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
 @end multitable
 
 
@@ -425,6 +429,70 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @end multitable
 
 
+@node OpenMP Technical Report 11
+@section OpenMP Technical Report 11
+
+Technical Report (TR) 11 is the first preview for OpenMP 6.0.
+
+@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
+@multitable @columnfractions .60 .10 .25
+@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
+  @tab N/A @tab Backward compatibility
+@item The @code{decl} attribute was added to the C++ attribute syntax
+  @tab N @tab
+@item @code{_ALL} suffix to the device-scope environment variables
+  @tab P @tab Host device number wrongly accepted
+@item For Fortran, @emph{locator list} can be also function reference with
+  data pointer result @tab N @tab
+@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
+  @tab N @tab
+@item Implicit reduction identifiers of C++ classes
+  @tab N @tab
+@item Change of the @emph{map-type} property from @emph{ultimate} to
+  @emph{default} @tab N @tab
+@item Concept of @emph{assumed-size arrays} in C and C++
+  @tab N @tab
+@item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
+  @tab N @tab
+@item @code{groupprivate} directive @tab N @tab
+@item @code{local} clause to declare target directive @tab N @tab
+@item @code{part_size} allocator trait @tab N @tab
+@item @code{pin_device}, @code{preferred_device} and @code{target_access}
+  allocator traits
+  @tab N @tab
+@item @code{access} allocator trait changes @tab N @tab
+@item Extension of @code{interop} operation of @code{append_args}, allowing all
+  modifiers of the @code{init} clause
+@item @code{interop} clause to @code{dispatch} @tab N @tab
+@item @code{apply} code to loop-transforming constructs @tab N @tab
+@item @code{omp_curr_progress_width} identifier @tab N @tab
+@item @code{safesync} clause to the @code{parallel} construct @tab N @tab
+@item @code{omp_get_max_progress_width} runtime routine @tab N @tab
+@item @code{strict} modifier keyword to @code{num_threads}, @code{num_tasks}
+  and @code{grainsize} @tab N @tab
+@item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
+@item Routines for obtaining memory spaces/allocators for shared/device memory
+  @tab N @tab
+@item @code{omp_get_memspace_num_resources} routine @tab N @tab
+@item @code{omp_get_submemspace} 

[Patch] libgomp: Add no-target-region rev offload test + fix plugin-nvptx

2022-11-24 Thread Tobias Burnus

The nvptx reverse-offload code mishandled the case that there was a reverse
offload function that isn't called inside a target region. In that case,
the linker did not include GOMP_target_ext and the global variable it uses.
But the plugin-nvptx.c code expected that the latter is present.

Found via sollve_vv's tests/5.0/requires/test_requires_reverse_offload.c which 
is
similar to the new testcase. (Albeit the 'if' and comments imply that the 
sollve_vv
author did not intend this.)

Solution: Handle it gracefully that the global variable does not exist - and
do this check first - and only when successful allocate dev->rev_data. If not,
deallocate rev_fn_table to disable reverse offload handling.

OK for mainline?

Tobias

PS: Admittedly, the nvptx code is not yet exercised as I still have to submit 
the
libgomp/target.c code handling the reverse offload (+ enabling requires 
reverse_offload
in plugin-nvptx.c). As it is obvious from this patch, the target.c patch is 
nearly but
not yet completely ready. - That patch passes the three sollve_vv testcases and 
also
the existing libgomp testcases, but some corner cases and more testcases are 
missing.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add no-target-region rev offload test + fix plugin-nvptx

OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case.  This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
	for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
	as valid and the code having no reverse-offload code.
	* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.

 libgomp/plugin/plugin-nvptx.c  | 36 ++--
 .../libgomp.c-c++-common/reverse-offload-2.c   | 49 ++
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 0768fca350b..e803f083591 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1390,7 +1390,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   else if (rev_fn_table)
 {
   CUdeviceptr var;
-  size_t bytes, i;
+  size_t bytes;
+  unsigned int i;
   r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, , , module,
 			 "$offload_func_table");
   if (r != CUDA_SUCCESS)
@@ -1413,12 +1414,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   if (rev_fn_table && *rev_fn_table && dev->rev_data == NULL)
 {
-  /* cuMemHostAlloc memory is accessible on the device, if unified-shared
-	 address is supported; this is assumed - see comment in
-	 nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.   */
-  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data,
-			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
-  CUdeviceptr dp = (CUdeviceptr) dev->rev_data;
+  /* Get the on-device GOMP_REV_OFFLOAD_VAR variable.  It should be
+	 available but it might be not.  One reason could be: if the user code
+	 has 'omp target device(ancestor:1)' in pure hostcode, GOMP_target_ext
+	 is not called on the device and, hence, it and GOMP_REV_OFFLOAD_VAR
+	 are not linked in.  */
   CUdeviceptr device_rev_offload_var;
   size_t device_rev_offload_size;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal,
@@ -1426,11 +1426,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   _rev_offload_size, module,
   XSTRING (GOMP_REV_OFFLOAD_VAR));
   if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error - GOMP_REV_OFFLOAD_VAR: %s", cuda_error (r));
-  r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, device_rev_offload_var, ,
-			 sizeof (dp));
-  if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+	{
+	  free (*rev_fn_table);
+	  *rev_fn_table = NULL;
+	}
+  else
+	{
+	  /* cuMemHostAlloc memory is accessible on the device, if
+	 unified-shared address is supported; this is assumed - see comment
+	 in nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */
+	  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data,
+			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
+	  CUdeviceptr dp = (CUdeviceptr) dev->rev_data;
+	  r = 

OpenMP Patch Ping

2022-11-24 Thread Tobias Burnus

Updated list as follow up to last ping at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601162.html


Recent patches:

Sandra's (Tue Nov 15 04:46:15 GMT 2022)
[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html


Julian's patches - I hope I got it right as I lost a bit track:

(Tue Nov 8 14:36:17 GMT 2022)
[PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++)
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605367.html

(Fri Sep 30 13:30:22 GMT 2022)
[PATCH v3 06/11] OpenMP: Pointers and member mappings
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602609.html

(Tue Oct 18 10:39:01 GMT 2022)
[PATCH v5 0/4] OpenMP/OpenACC: Fortran array descriptor mappings
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/thread.html#603790
(I think this is partially my task to review those.)


Approved but waiting for the Fortran patches (v5) to get approved.
[PATCH v3 08/11] OpenMP/OpenACC: Rework clause expansion and nested struct 
handling
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602010.html


Possibly requiring a second look/review despite my initial comment
(which might require revisions on the patch side as well):
OpenMP: Duplicate checking for map clauses in Fortran (PR107214)
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604033.html


Older patches:

* [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
  * Unified-Shared Memory & Pinned Memory

Depending on those:

* [PATCH] OpenMP, libgomp: Handle unified shared memory in 
omp_target_is_accessible.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594187.html

* [PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust 
libgfortran memory allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599703.html
  (Fortran part, required for ...)
* Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM 
allocators into libgfortran
  https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html

And finally:

* [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599332.html
(Side remark: some other debugging support like showing the mapping being done 
as stderr output or ... would be nice as well; might depend on a 
libgomp-debug.so and/or -f...(sanitize=openmp or ...); the other open-source 
compiler has something similar.)


 * * *


Pending libgomp/nvptx patches:

(Wed Sep 21 07:45:36 GMT 2022)
[PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601922.html

(Wed Sep 21 07:45:54 GMT 2022)
[PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction 
support in GCC
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601925.html

Those were pinged 4 times :-(


Hopefully, I have not missed any patch

Tobias


PS: The following list covers pending patches, which have been reviewed but
but need to updated before being ready - hopefully, this list is also up to 
date:

* (No pending patch, but wwwdoc's changes-13.html + projects/gomp/ need an 
update before GCC 13)

* [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices
Should be re-submitted any time soon (today, next few days)

* [Patch] OpenMP/Fortran: Permit end-clause on directive
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600433.html
  Trivial patch modifications required - mostly LGTM already.

* [PATCH] libgomp: fix hang on fatal error
  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html
(Patch rejected but alternative solutions were suggested.)

* Re: [Patch] OpenMP/Fortran: Use firstprivat not alloc for ptr attach for 
arrays
(Committed but failing occasionally:)
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605854.html

* "[PATCH 3/3] vect: inbranch SIMD clones"
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599490.html
  Review comments to be addressed.

* [PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html

* [PATCH] openmp: fix max_vf setting for amdgcn offloading
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598265.html
→ To be updated for review comments.
(Side note: we should at some point find a way to improve target-specific
handling; similar to the are-exceptions-supported issue of PR101544 but
there are more.)

* [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions
  https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html
  * Needs to be revised according to review comments

* Fortran allocatable components handling (needs to be split into separate 
pieces and submitted
  separately)
  

*PING* - [wwwdocs] projects/gomp: TR11 + GCC13 update

2022-11-23 Thread Tobias Burnus

On 11.11.22 16:13, Tobias Burnus wrote:

This patch adds TR11 to the history of OpenMP releases – and it does
an update of the implementation status.

OK?

Tobias

PS: The implementation-status changes were lying around in that file
for a while. I think both the GCC 13 release notes and this file needs
some update for more recent changes. Nonetheless, while incomplete,
the changes themselves should be fine.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-21 Thread Tobias Burnus

On 19.11.22 11:46, Tobias Burnus wrote:

+   stacklimit = stackbase + seg_size*64;

(this should be '*seg_size' not 'seg_size' and the name should be
s/seg_size/seg_size_ptr/.)

I have updated the comment and ...

(Reading it, I think it should be '..._MEM(SImode,' and
'..._MULT(SImode' instead of DImode.)

Additionally, there was a problem of bytes vs. bits in:

My understanding is that
dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192)


which is wrong - its 192 bits but only 24 bytes!

Finally, in the first_call_this_thread_p() call, I mixed up EQ vs. NE at one 
place.

BTW: It seems as if there is no problem with zero extension, if I look at the 
assembler result.

Updated version. Consists of: GCC patch adding the builtins,
the newlib patch using those (unchanged; used for testing + to be submitted), 
and
a 'test.c' using the builtins and its dump produced with amdgcn's
'cc1 -O2' to show the resulting assembly.

Tested with libgomp on gfx908 offloading and getting only the known fails:
(libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90,
libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c})

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
	* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
	* config/gcn/gcn.md (prologue_use): Add "register_operand" as
	arg to match_operand.
	(prologue_use_di): New; DI insn_and_split variant of the former.

Co-Authored-By: Andrew Stubbs 

 gcc/config/gcn/gcn-builtins.def |  4 +++
 gcc/config/gcn/gcn.cc   | 70 -
 gcc/config/gcn/gcn.md   | 15 -
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index eeeaebf9013..f1cf30bbc94 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
 
 /* Kernel inputs.  */
 
+DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
+	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
+	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
 #undef _A1
 #undef _A2
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b3814c2e7c6..ea9631e8823 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4493,6 +4493,45 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
   emit_insn (gen_gcn_wavefront_barrier ());
   return target;
 
+case GCN_BUILTIN_GET_STACK_LIMIT:
+  {
+	/* stackbase = (stack_segment_decr & 0x)
+			+ stack_wave_offset);
+	   seg_size = dispatch_ptr->private_segment_size;
+	   stacklimit = stackbase + seg_size*64;
+	   with segsize = *(uint32_t *) ((char *) dispatch_ptr
+   + 6*sizeof(int16_t) + 3*sizeof(int32_t));
+	   cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+	&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx size_rtx = gen_rtx_REG (DImode,
+			 cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	size_rtx = gen_rtx_MEM (SImode,
+gen_rtx_PLUS (DImode, size_rtx,
+		  GEN_INT (6*2 + 3*4)));
+	size_rtx = gen_rtx_MULT (SImode, size_rtx, GEN_INT (64));
+
+	ptr = gen_rtx_REG (DImode,
+		cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x));
+	ptr = gen_rtx_PLUS (DImode, ptr, size_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0)
+	  {
+		rtx off;
+		off = gen_rtx_REG (SImode,
+		  cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]);
+		ptr = gen_rtx_PLUS (DImode, ptr, off);
+	  }
+	  }
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_KERNARG_PTR:
   {
 	rtx ptr;
@@ -4506,7 +4545,36 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
-
+case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
+  {
+	/* Stas

[Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)

2022-11-21 Thread Tobias Burnus

Working on the builtins, I realized that I mixed up (again) bits and byes.
While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 
128 bytes.
Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only 
need 6.

This patch now makes use of the available space, avoiding one device-to-host 
memory copy;
additionally, it avoids a 32bit vs 64bit alignment issue which I somehow missed 
:-(

Tested with libgomp on gfx908 offloading and getting only the known fails:
(libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90,
libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c})

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: fix/improve struct output

output.printf_data.(value union) contains text[128], which has the size
of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2]
could be extended to value_u64[6] - sufficient for all required arguments
to gomp_target_rev.  Additionally, next_output.printf_data.(msg union)
contained msg_u64 which then is no longer needed and also caused 32bit
vs 64bit alignment issues.

libgomp/
	* config/gcn/libgomp-gcn.h (struct output):
	Remove 'msg_u64' from the union, change
	value_u64[2] to value_u64[6].
	* config/gcn/target.c (GOMP_target_ext): Update accordingly.
	* plugin/plugin-gcn.c (process_reverse_offload, console_output):
	Likewise.

 libgomp/config/gcn/libgomp-gcn.h |  7 ++-
 libgomp/config/gcn/target.c  | 12 ++--
 libgomp/plugin/plugin-gcn.c  | 17 +++--
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
index 91560be787f..3933e846a86 100644
--- a/libgomp/config/gcn/libgomp-gcn.h
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -37,16 +37,13 @@ struct output
   unsigned int next_output;
   struct printf_data {
 int written;
-union {
-  char msg[128];
-  uint64_t msg_u64[2];
-};
+char msg[128];
 int type;
 union {
   int64_t ivalue;
   double dvalue;
   char text[128];
-  uint64_t value_u64[2];
+  uint64_t value_u64[6];
 };
   } queue[1024];
   unsigned int consumed;
diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index 27854565d40..11ae6ec9833 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -102,12 +102,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   asm ("s_sleep 64");
 
   unsigned int slot = index % 1024;
-  uint64_t addrs_sizes_kind[3] = {(uint64_t) hostaddrs, (uint64_t) sizes,
-  (uint64_t) kinds};
-  data->queue[slot].msg_u64[0] = (uint64_t) fn;
-  data->queue[slot].msg_u64[1] = (uint64_t) mapnum;
-  data->queue[slot].value_u64[0] = (uint64_t) _sizes_kind[0];
-  data->queue[slot].value_u64[1] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num;
+  data->queue[slot].value_u64[0] = (uint64_t) fn;
+  data->queue[slot].value_u64[1] = (uint64_t) mapnum;
+  data->queue[slot].value_u64[2] = (uint64_t) hostaddrs;
+  data->queue[slot].value_u64[3] = (uint64_t) sizes;
+  data->queue[slot].value_u64[4] = (uint64_t) kinds;
+  data->queue[slot].value_u64[5] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num;
 
   data->queue[slot].type = 4; /* Reverse offload.  */
   __atomic_store_n (>queue[slot].written, 1, __ATOMIC_RELEASE);
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ffe5cf5af2c..388e87b7765 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1919,16 +1919,12 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams)
 }
 
 static void
-process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t rev_data,
-			 uint64_t dev_num64)
+process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t hostaddrs,
+			 uint64_t sizes, uint64_t kinds, uint64_t dev_num64)
 {
   int dev_num = dev_num64;
-  uint64_t addrs_sizes_kinds[3];
-  GOMP_OFFLOAD_host2dev (dev_num, _sizes_kinds, (void *) rev_data,
-			 sizeof (addrs_sizes_kinds));
-  GOMP_PLUGIN_target_rev (fn, mapnum, addrs_sizes_kinds[0],
-			  addrs_sizes_kinds[1], addrs_sizes_kinds[2],
-			  dev_num, NULL, NULL, NULL);
+  GOMP_PLUGIN_target_rev (fn, mapnum, hostaddrs, sizes, kinds, dev_num,
+			  NULL, NULL, NULL);
 }
 
 /* Output any data written to console output from the kernel.  It is expected
@@ -1976,8 +1972,9 @@ console_output (struct kernel_info *kernel, struct kernargs *kernargs,
 	case 2: printf ("%.128s%.128s\n", data->msg, data->text); break;
 	case 3: printf ("%.128s%.128s", data->msg, data->text); break;
 	case 4:
-	  process_reverse_offload (data->msg_u64[0], data->msg_u64[1],
-   data->value_u64[0],data->value_u64[1]);
+	  process_reverse_offload (data->value_u64[0], 

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-19 Thread Tobias Burnus

On 18.11.22 18:49, Andrew Stubbs wrote:

On 18/11/2022 17:20, Tobias Burnus wrote:

This looks wrong:


+/* stackbase = (stack_segment_decr & 0x)
++ stack_wave_offset);
+   seg_size = dispatch_ptr->private_segment_size;
+   stacklimit = stackbase + seg_size*64;

(this should be '*seg_size' not 'seg_size' and the name should be
s/seg_size/seg_size_ptr/.)

+   with segsize = dispatch_ptr + 6*sizeof(int16_t) +
3*sizeof(int32_t);
+   cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */
+rtx ptr;
+if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+  {
+rtx size_rtx = gen_rtx_REG (DImode,
+ cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+size_rtx = gen_rtx_MEM (DImode,
+gen_rtx_PLUS (DImode, size_rtx,
+  GEN_INT (6*16 + 3*32)));
+size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+

(Reading it, I think it should be '..._MEM(SImode,' and
'..._MULT(SImode' instead of DImode.)

seg_size is calculated from the private_segment_size loaded from the
dispatch_ptr, not calculated from the dispatch_ptr itself.


Isn't this what thee code tries to do? Namely:


My understanding is that

dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192)

And the latter is what I attempt to do. I have a very limited knowledge
of insn/rtx/RTL and of GCN assemply; thus, I likely have done something
stupid. Having said this, Here is what I get:

(Where asm("s4") == dispatch_ptr)

s_add_u32   s2, s4, 192
s_addc_u32  s3, s5, 0
v_writelane_b32 v4, s2, 0
v_writelane_b32 v5, s3, 0
s_mov_b64   exec, 1
flat_load_dwordx2   v[4:5], v[4:5]
s_waitcnt   0
v_lshlrev_b64   v[4:5], 6, v[4:5]
v_readlane_b32  s2, v4, 0
v_readlane_b32  s3, v5, 0

Not that I really understand every line, but at a glance it
looks okay.

The 192 is because of (quoting newlib/libc/machine/amdgcn/getreent.c):

typedef struct hsa_kernel_dispatch_packet_s {
  uint16_t header ;
  uint16_t setup;
  uint16_t workgroup_size_x ;
  uint16_t workgroup_size_y ;
  uint16_t workgroup_size_z;
  uint16_t reserved0;
  uint32_t grid_size_x ;
  uint32_t grid_size_y ;
  uint32_t grid_size_z;
  uint32_t private_segment_size;

i.e. 6*16 + 3*32 = 192 – and we want to read a 32bit unsigned int.

 * * *

Admittedly, there is probably something not quite right as I see with gfx908

  # of expected passes27476
  # of unexpected failures317

where 317 FAIL comes from 88 testcase files.

That's not a a very high number but more than the usual fails, which shows that
something is not quite right.

 * * *

I am pretty sure that I missed something - but the question is what.
I hope you can help me pinpoint the place where it goes wrong.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Tobias Burnus

Attached is the updated/rediffed version, which now uses the builtin
instead of the 'asm("s8").

The code in principle works; that is: If no private stack variables are
copied, it works.

Or in other words: reverse-offload target regions that don't use
firstprivate or mapping work, the rest would crash. That's avoided by
not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now.

To get it working, the manual stack allocation patch + the trivial
update to that get_num_devices func is needed, but no change to the
attached patch.

In order to reduce local patches, I would love to have it on mainline –
otherwise, I have at least the current version in gcc-patches@.

Tobias

PS: Previous patch email quoted below. Note: there were two follow up
emails, one by Andrew and one by me; cf. your own mail archive (of this
thread) or
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603383.html + the
next two by thread messages.

On 12.10.22 16:29, Tobias Burnus wrote:

On 29.09.22 18:24, Andrew Stubbs wrote:

On 27/09/2022 14:16, Tobias Burnus wrote:

Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance regarding how to actually implement it.

[...]
The point is that you can use the "msg" and "text" fields for
whatever data you want, as long as you invent a new value for "type".
[]
You can make "case 4" do whatever you want. There are enough bytes
for 4 pointers, and you could use multiple packets (although it's not
safe to assume they're contiguous or already arrived; maybe "case 4"
for part 1, "case 5" for part 2). It's possible to change this
structure, of course, but the target implementation is in newlib so
versioning becomes a problem.


I think  – also looking at the Newlib write.c implementation - that
the data is contiguous: there is an atomic add, where instead of
passing '1' for a single slot, I could also add '2' for two slots.

Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev,
it needs the generic parts of the sister nvptx patch.*

2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 +
32.) As target_ext is blocking, I decided to use a stack local
variable for the remaining arguments and pass it along. Alternatively,
I could also use 2 slots - and process them together. This would avoid
one device->host memory copy but would make console_output less clear.

OK for mainline?

Tobias

* https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html

PS: Currently, device stack variables are private and cannot be
accessed from the host; this will change in a separate patch. It not
only affects the "rest" part as used in this patch but also the actual
arrays behind addr, kinds, and sizes. And quite likely a lot of the
map/firstprivate variables passed to addr.

As num_devices() will return 0 or -1, this is for now a non-issue.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: Prepare for reverse-offload callback handling

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h: New file; contains
	struct output, declared previously in plugin-gcn.c.
	* config/gcn/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare as extern var.
	(GOMP_target_ext): Handle reverse offload.
	* plugin/plugin-gcn.c: Include libgomp-gcn.h.
	(struct kernargs): Replace struct def by the one
	from libgomp-gcn.h for output_data.
	(process_reverse_offload): New.
	(console_output): Call it.

 libgomp/config/gcn/libgomp-gcn.h | 61 
 libgomp/config/gcn/target.c  | 44 -
 libgomp/plugin/plugin-gcn.c  | 34 --
 3 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
new file mode 100644
index 000..91560be787f
--- /dev/null
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -0,0 +1,61 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Tobias Burnus .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additio

[Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Tobias Burnus

This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this thread).

The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stack on the compiler side without the need to
modify newlib again. The GCC patch matches what newlib did in reent; I could
imagine that we change this later on.

Lightly tested (especially by visual inspection).
Currently doing a final regtest, OK when it passes?

Any  comments to this patch - or the attached newlib patch?*

Tobias

(*) I also included a patch to newlib to see where were are heading
+ to actually use them for regtesting ...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
	* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
	* config/gcn/gcn.md (prologue_use): Add "register_operand" as
	arg to match_operand.
	(prologue_use_di): New; DI insn_and_split variant of the former.

Co-Authored-By: Andrew Stubbs 

 gcc/config/gcn/gcn-builtins.def |  4 +++
 gcc/config/gcn/gcn.cc   | 70 -
 gcc/config/gcn/gcn.md   | 15 -
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index eeeaebf9013..f1cf30bbc94 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
 
 /* Kernel inputs.  */
 
+DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
+	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
+	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
 #undef _A1
 #undef _A2
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b3814c2e7c6..051eadee783 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4493,6 +4493,44 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
   emit_insn (gen_gcn_wavefront_barrier ());
   return target;
 
+case GCN_BUILTIN_GET_STACK_LIMIT:
+  {
+	/* stackbase = (stack_segment_decr & 0x)
+			+ stack_wave_offset);
+	   seg_size = dispatch_ptr->private_segment_size;
+	   stacklimit = stackbase + seg_size*64;
+	   with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t);
+	   cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+	&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx size_rtx = gen_rtx_REG (DImode,
+	cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	size_rtx = gen_rtx_MEM (DImode,
+gen_rtx_PLUS (DImode, size_rtx,
+		  GEN_INT (6*16 + 3*32)));
+	size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+
+	ptr = gen_rtx_REG (DImode,
+		cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x));
+	ptr = gen_rtx_PLUS (DImode, ptr, size_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0)
+	  {
+		rtx off;
+		off = gen_rtx_REG (SImode,
+		  cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]);
+		ptr = gen_rtx_PLUS (DImode, ptr, off);
+	  }
+	  }
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_KERNARG_PTR:
   {
 	rtx ptr;
@@ -4506,7 +4544,37 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
-
+case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
+  {
+	/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
+	   whether it was the first call.  */
+	rtx result = gen_reg_rtx (BImode);
+	emit_move_insn (result, const0_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx not_first = gen_label_rtx ();
+	rtx reg = gen_rtx_REG (DImode,
+			cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	rtx cmp = force_reg (DImode,
+ gen_rtx_AND (DImode, reg,
+	  GEN_INT (0xL)));
+	emit_insn (gen_cstoresi4 (result, gen_rtx_EQ (BImode, cmp,
+			  GEN_INT(12345L << 48)),
+  cmp, GEN_INT(12345L << 48)));
+	

[patch] gcn: Add __builtin_gcn_kernarg_ptr

2022-11-16 Thread Tobias Burnus

This is a part of a patch by Andrew (hi!) - namely that part that only adds the
__builtin_gcn_kernarg_ptr. More is planned, see below.

The short term benefit of this patch is to permit replacing hardcoded numbers
by a builtin – like in libgomp (see patch) or in newlib (not submitted):

--- a/newlib/libc/sys/amdgcn/write.c
+++ b/newlib/libc/sys/amdgcn/write.c
@@ -59,1 +59,5 @@ _READ_WRITE_RETURN_TYPE write (int fd, const void *buf, 
size_t count)
+#if defined(__has_builtin) && __has_builtin(__builtin_gcn_kernarg_ptr)
+  register void **kernargs = __builtin_gcn_kernarg_ptr ();
+#else
   register void **kernargs asm("s8");
+#endif

It would also replace the 'asm("s8")' in reverse offload (GCN) patch, i.e.
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html


However, this patch is only the very first step. Next one is to add
several additional builtins, namely those that are required for newlib,
i.e.  newlib/libc/machine/amdgcn/mlock.c (sbrk) and
newlib/libc/machine/amdgcn/getreent.c (__getreent) use some additional
hard-coded value for heap and stack memory.

And at some point - but only after newlib has been updated -
we can think of making stack variables non-private.
That's a general goal - and in any case required for reverse
offload to be able to transfer between the host and on-device stack
variables.

* * *

Regarding the patch: Besides the obvious change (addition of the builtin),
the change to DEFAULT memory space is required to avoid a memory-space 
conversion
ICE when using the new builtin. The gcn_oacc_dim_size change is mainly just
picked from Andrew's patch as it seems to be reasonable. In terms of the libgomp
testsuite, I did not spot anything except that the -O2 run now does no longer 
fail
with "libgomp: target function wasn't mapped" for
libgomp.oacc-fortran/kernels-map-1.f90 - but I am not sure it is related or not.

In any case, the libgomp testsuite shows no fails (but the usual fails)
with the attached patch.

OK for mainline?

Tobias

PS: The plan is to have at least all builtins in GCC and use them in newlib by 
at
the end of this year (i.e. in newlib's end of year snapshot - aka as annual
release).

PPS: I wonder whether
  [Patch] libgomp/gcn: Prepare for reverse-offload callback handling
  https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html
would be okay after this patch - with the asm("s8") replaced by the builtin - 
or not.
The code itself would be fine, but it is unreachable until
GOMP_OFFLOAD_get_num_devices accepts reverse offload and the latter depends
on the support for non-private stack variables.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Add __builtin_gcn_kernarg_ptr

Add __builtin_gcn_kernarg_ptr to avoid using hard-coded register values
and permit future ABI changes while keeping the API.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (KERNARG_PTR): Add.
* config/gcn/gcn.cc (gcn_init_builtin_types): Change siptr_type_node,
	sfptr_type_node and voidptr_type_node from FLAT to ADDR_SPACE_DEFAULT.
(gcn_expand_builtin_1): Handle GCN_BUILTIN_KERNARG_PTR.
(gcn_oacc_dim_size): Return in ADDR_SPACE_FLAT.

libgomp/ChangeLog:

* config/gcn/team.c (gomp_gcn_enter_kernel): Use
	__builtin_gcn_kernarg_ptr instead of asm ("s8").

Co-Authored-By: Andrew Stubbs 

 gcc/config/gcn/gcn-builtins.def |  4 
 gcc/config/gcn/gcn.cc   | 24 
 libgomp/config/gcn/team.c   |  2 +-
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index c50777bd..eeeaebf 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -158,6 +158,10 @@ DEF_BUILTIN (ACC_SINGLE_COPY_END, -1, "single_copy_end", B_INSN,
 DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
 	 gcn_expand_builtin_1)
 
+/* Kernel inputs.  */
+
+DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
+	 gcn_expand_builtin_1)
 
 #undef _A1
 #undef _A2
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 5e6f3b8..b3814c2 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4058,15 +4058,15 @@ gcn_init_builtin_types (void)
 	  (integer_type_node) */
 	, 64);
   tree tmp = build_distinct_type_copy (intSI_type_node);
-  TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT;
+  TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT;
   siptr_type_node = build_pointer_type (tmp);
 
   tmp = build_distinct_type_copy (float_type_node);
-  TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT;
+  TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT;
   sfptr_type_node = build_pointer_type (tmp);
 
   tmp = build_distinct_type_copy (void_type_node);
-  TYPE_ADDR_SPACE (tmp) = 

[Patch] nvptx/mkoffload.cc: Fix "$nohost" check

2022-11-15 Thread Tobias Burnus

Found when working on real reverse offload - as
the reverse-offload stub function was added to the reverse-offload table.
Reason - as mentioned in the commit log: lhd_set_decl_assembler_name.

I intent to commit it tomorrow as obvious, unless there are further
comments.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
nvptx/mkoffload.cc: Fix "$nohost" check

If lhd_set_decl_assembler_name is invoked - in particular if
!TREE_PUBLIC (decl) && !DECL_FILE_SCOPE_P (decl) - the '.nohost' suffix
might change to '.nohost.2'. This happens for the existing reverse offload
testcases via cgraph_node::analyze and is a side effect of
r13-3455-g178ac530fe67e4f2fc439cc4ce89bc19d571ca31 for some reason.

The solution is to not only check for a tailing '$nohost' but also for
'$nohost$' in nvptx/mkoffload.cc.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (process): Recognize '$nohost$...'
	besides tailing '$nohost' as being for reverse offload.

 gcc/config/nvptx/mkoffload.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 854cd72f3c7..5d89ba8a788 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -364,7 +364,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	 Alternatively, besides searching for 'BEGIN FUNCTION DECL',
 	 checking for '.visible .entry ' + id->ptx_name would be
 	 required.  */
-	  if (!endswith (id->ptx_name, "$nohost"))
+	  if (!endswith (id->ptx_name, "$nohost")
+	  && !strstr (id->ptx_name, "$nohost$"))
 	continue;
 	  fprintf (out, "\t\".extern ");
 	  const char *p = input + file_idx[fidx];
@@ -402,7 +403,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		"$offload_func_table[] = {");
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
 	fprintf (out, "%s\"\n\t\t\"%s", comma,
-		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+		 (endswith (id->ptx_name, "$nohost")
+		  || strstr (id->ptx_name, "$nohost$")) ? id->ptx_name : "0");
   fprintf (out, "};\\n\";\n\n");
 }
 


[wwwdocs] projects/gomp: TR11 + GCC13 update

2022-11-11 Thread Tobias Burnus via Gcc-patches
This patch adds TR11 to the history of OpenMP releases – and it does an 
update of the implementation status.


OK?

Tobias

PS: The implementation-status changes were lying around in that file for 
a while. I think both the GCC 13 release notes and this file needs some 
update for more recent changes. Nonetheless, while incomplete, the 
changes themselves should be fine.
projects/gomp: TR11 + GCC13 update

 htdocs/projects/gomp/index.html | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 713a4e16..46f393c8 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -677,7 +677,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 device-specific ICV settings with environment variables
-No
+GCC13
 
   
   
@@ -771,10 +771,10 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
-  
+  
 omp/ompx/omx sentinels and omp_/ompx_ namespaces
 N/A
-
+warning for ompx/omx sentinels (1)
   
   
 Clauses on end directive can be on directive
@@ -888,7 +888,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 New doacross clause as alias for depend with source/sink modifier
-No
+GCC13
 
   
   
@@ -898,7 +898,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 omp_cur_iteration keyword
-No
+GCC13
 
   
   
@@ -924,9 +924,22 @@ than listed, depending on resolved corner cases and optimizations.
 
 
 
+(1) The
+ompx sentinel as C/C++ pragma and C++ attributes are warned for
+with -Wunknown-pragmas (implied by -Wall) and
+-Wattributes (enabled by default), respectively; for Fortran
+free-source code, there is a warning enabled by default and, for fixed-source
+code, the omx sentinel is warned for with -Wsurprising
+(enabled by -Wall). Unknown clauses are always rejected with an
+error.
 
 OpenMP Releases and Status
 
+November 9, 2022
+https://www.openmp.org/wp-content/uploads/openmp-TR11.pdf;>OpenMP
+Technical Report 11 (first preview for the OpenMP API Version 6.0) has been
+released.
+
 November 9, 2021
 https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf;>OpenMP
 Version 5.2 has been released.


Re: old install to a different folder

2022-11-11 Thread Tobias Burnus

Hi Richard,

On 11.11.22 11:18, Richard Bienr wrote:


Note I think we can "remove" the install/ and onlinedocs/ _landing_ pages
(index.html) but we should keep the actual content pages so old links keep
working.  We can also replace the landing pages with a pointer to the new
documentation (or plain re-direct to that!).


For install, I think we should consider to redirect. Before the move to Sphinx, 
we had only:

binaries.html
build.html
configure.html
download.html
finalinstall.html
gfdl.html
index.html
prerequisites.html
specific.html
test.html

Re-directing them to the new pages will work. There is a one-to-one 
correspondence for all but
build/test which are now in 7* and 5 files, respectively. Still linking to the 
outermost
should be ok as I do not think that there will be many links using '#...'.

(*The subdivision is also a bit pointless for Ada and D as it consists only of 
the texts
"GNAT prerequisites." and "GDC prerequisites.", respectively (in the old doc).
In the Sphinx docs, it is even shortened to: "GNAT." and "GDC.".)

The only except where links to page anchors are likely used is for
"Host/target specific installation notes for GCC".
For them, some like '#avr' still works while others don't (like 'nvptx-*-none'
as '#nvptx-x-none' changed to '#nvptx-none'). But the page is short enough and
it is clear from the context what the user wants - there is also a table of
content on the right to click on. (IMHO that's sufficient.)

* * *

For /onlinedocs/, I concur that we want to have the old doc there as there are 
many
deep links. Still, we should consider adding a disclaimer box to all former 
mainline
documentation stating that this data is no longer updated + point to the new 
overview
page + we could redirect access which goes directly to '//' and not 
a (sub)html
page to the new site, as you proposed.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: old install to a different folder

2022-11-11 Thread Tobias Burnus

On 11.11.22 09:50, Martin Liška wrote:

I do support the Richi's idea about using a new URL for the new Sphinx 
documentation
while keeping the older Texinfo documentation under /onlinedocs and /install


If we do so and those become then static files: Can we put some
disclaimer at the top of all HTML files under /install/ and under
/onlinedocs// that those are legacy files and the new
documentation can be found under  (not a deep link but directly to
the install pages or the new overview page about the Sphinx docs).

I think we really need such a hint – otherwise it is more confusing than
helpful! Additionally, we should add a "news" entry to the mainpage
pointing out that it changed and linking to the new Sphinx doc.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: old install to a different folder

2022-11-11 Thread Tobias Burnus

Hi Gerald,

On 10.11.22 20:24, Gerald Pfeifer wrote:

On Thu, 10 Nov 2022, Martin Liška wrote:

We noticed we'll need the old /install to be available for redirect.

Gerald, can you please put it somewhere under /install-prev, or
something similar?

I'm afraid I am confused now. Based on your original request I had removed
the original /install directoy.


I think we just need to handle more. Namely:

* Links directly to https://gcc.gnu.org/install/
  this works and shows the new page.

* Sublinks - those currently fail as the name has changed:
  https://gcc.gnu.org/install/configure.html (which is now 
https://gcc.gnu.org/install/configuration.html )
  https://gcc.gnu.org/install/build.html (now: 
https://gcc.gnu.org/install/building.html )
  https://gcc.gnu.org/install/specific.html#avr → 
https://gcc.gnu.org/install/host-target-specific-installation-notes-for-gcc.html#avr

My impression is that it is sufficient to handle those renamings and we do not 
need the old pages.

However, others might have different ideas. Note that this was discussed in the thread 
"Links to web pages are broken."

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [DOCS] sphinx: use new Sphinx links

2022-11-10 Thread Tobias Burnus

Hi,

On 10.11.22 11:03, Gerald Pfeifer wrote:

On Thu, 10 Nov 2022, Martin Liška wrote:

https://gcc.gnu.org/install/ is back with a new face.

But it's not working properly due to some Content Security Policy:

Hmm, it worked in my testing before and I just tried again:
Firefox 106.0.1 (64-bit)


Did you open the console (F12)? If I do, I see the errors:

Content Security Policy: The page’s settings blocked the loading of a
resource at inline (“default-src”). That's for line 18, which is

Re: [Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

2022-11-06 Thread Tobias Burnus

Hello,

On 06.11.22 21:32, Mikael Morin wrote:

Le 05/11/2022 à 23:28, Tobias Burnus a écrit :

OK for mainline?

The trans-array.c part looks good.
A couple of nits for the trans-expr.cc part:


-  /* Use the rhs string length and the lhs element size.  */
-  size = string_length;
-  tmp = TREE_TYPE (gfc_typenode_for_spec (>ts));
-  tmp = TYPE_SIZE_UNIT (tmp);
+  /* Use the rhs string length and the lhs element size. Note
that 'size' is
+ used below for the string-length comparison, only.  */
+  size = string_length,

s/,/;/ ?

+  tmp = TYPE_SIZE_UNIT (gfc_get_char_type (expr2->ts.kind));

Here you are using the rhs element size, which contradicts the
comment, so there is certainly something to fix here (either the
comment or the code).


I did remove it in between for testing – but obviously completely messed up 
when re-adding it :-/

However, testing indicates that expr1 vs. expr2 does not make a difference for 
the kind calculation:
  character(len=:,kind=1), allocatable :: c1l
  character(len=:,kind=4), allocatable :: c4l
  c1l = c4l
  c4l = c1l
as the code path is different and the result is in either case:
c1l = (character(kind=1)[1:.c1l] *) __builtin_realloc ((void *) c1l, MAX_EXPR 
<(sizetype) .c4l, 1>);
c4l = (character(kind=4)[1:.c4l] *) __builtin_realloc ((void *) c4l, MAX_EXPR 
<(sizetype) .c1l * 4, 1>);

Still, matching the comment makes sense.


As for the testcase, do you keep the code commented on purpose?

I think it happened when I did 'git add' after adding the PR to the
testcase, missing the commented lines I added for the explaining dumps :-/

Can some of it be removed or uncommented?


It should be all uncommented, except for the 'print' line.

Updated patch attached; passed quick testing + I will fully regtest it.
— I will commit it, unless more comments come up.

Tobias

PS: Writing patches while being tired works, but writing clean patches
obvious does not.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

The check whether reallocation on assignment was required did not handle
kind=4 characters correctly such that there was always a reallocation,
implying issues with pointer addresses and lower bounds.  Additionally,
with all deferred strings, the old memory was not freed on reallocation.
And, finally, inside the block which was only executed if string lengths
or bounds or dynamic types changed, was a subcheck of the same, which
was effectively a no op but still confusing and at least added with -O0
extra instructions to the binary.

	PR fortran/107508

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_alloc_allocatable_for_assignment): Fix
	string-length check, plug memory leak, and avoid generation of
	effectively no-op code.
	* trans-expr.cc (alloc_scalar_allocatable_for_assignment): Extend
	comment; minor cleanup.

gcc/testsuite/ChangeLog:

	* gfortran.dg/widechar_11.f90: New test.

 gcc/fortran/trans-array.cc| 57 ---
 gcc/fortran/trans-expr.cc |  6 ++--
 gcc/testsuite/gfortran.dg/widechar_11.f90 | 51 +++
 3 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 514cb057afb..b7d4c41b5fe 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -10527,7 +10527,6 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   tree offset;
   tree jump_label1;
   tree jump_label2;
-  tree neq_size;
   tree lbd;
   tree class_expr2 = NULL_TREE;
   int n;
@@ -10607,6 +10606,11 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 	elemsize1 = expr1->ts.u.cl->backend_decl;
   else
 	elemsize1 = lss->info->string_length;
+  tree unit_size = TYPE_SIZE_UNIT (gfc_get_char_type (expr1->ts.kind));
+  elemsize1 = fold_build2_loc (input_location, MULT_EXPR,
+   TREE_TYPE (elemsize1), elemsize1,
+   fold_convert (TREE_TYPE (elemsize1), unit_size));
+
 }
   else if (expr1->ts.type == BT_CLASS)
 {
@@ -10699,19 +10703,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   /* Allocate if data is NULL.  */
   cond_null = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
 			 array1, build_int_cst (TREE_TYPE (array1), 0));
-
-  if (expr1->ts.type == BT_CHARACTER && expr1->ts.deferred)
-{
-  tmp = fold_build2_loc (input_location, NE_EXPR,
-			 logical_type_node,
-			 lss->info->string_length,
-			 rss->info->string_length);
-  cond_null = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-   logical_type_node, tmp, cond_null);
-  cond_null= gfc_evaluate_now 

[Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

2022-11-05 Thread Tobias Burnus

Prior to the attached patch, there is a problem with realloc on assignment
with kind=4 characters as the string length was compared with the byte size,
which was always true.

I initially thought, looking at the code, that scalars have the same issues,
but they don't; hence, I ended up with a comment and a cleanup.

For arrays: The issue shows up in the testcase (→ PR) because there was
unnecessary reallocation on assignment, which changed the lower bound to 1.

The rest, I found looking at the dump:

(a) cond_null was:
D.4298 = .a4str != 7 || (character(kind=4)[0:][1:.a4str] *) a4str.data == 
0B;
...
  if (D.4298)
  a4str.data = __builtin_malloc (168);
  else
  a4str.data = __builtin_realloc (a4str.data, 168);
which is the wrong condition. It should be just:
  D.4298 = (character(kind=4)[0:][1:.a4str] *) a4str.data == 0B;
to avoid a memory leak.

(b) The rest was removing bogus code; I think it did not do any harm, but makes
the code and the dump rather convoluted.

The dump (with and without the patch) starts with:

  D.4295 = .a4str * 4;
  .a4str = 7;
  D.4298 = (character(kind=4)[0:][1:.a4str] *) a4str.data == 0B;
  if (D.4298) goto L.6;
  if (a4str.dim[0].lbound + 5 != a4str.dim[0].ubound) goto L.6;
  if (D.4295 != 28) goto L.6;
  goto L.7;
  L.6:;
  a4str.dim[0].lbound = 1;

  if (D.4298)
  a4str.data = __builtin_malloc (168);
  else
  a4str.data = __builtin_realloc (a4str.data, 168);
  L.7:;

Thus, any code which reaches L.6 should be reallocated and any code
which does not, shouldn't.

The deleted code did add directly after L.6 the following additional code:
if (D.4298)
D.4282 = 0;
else
D.4282 = MAX_EXPR  + 
1;
D.4283 = D.4282 != 6;
and it changed the 'else' into an 'else if' in
  if (D.4298)
  a4str.data = __builtin_malloc (168);
  else if (D.4283)
  a4str.data = __builtin_realloc (a4str.data, 168);

Closely looking at the added condition and at source code, it
does essentially the same check as the code which guarded the L.6 to L.7
code. Thus, the condition should always evaluate as true.

Codewise, the 'D.4282 != 6'  is the 'size1 != size2' array size comparison.

I think it was the now removed code was there before, but then someone
realized the array bounds problem - and the new code was added without
actually removing the old one. The handling of deferred strings both in
the bogus condition for cond_null and by setting 'D.4283' to always true
is not only wrong but implies some early hack.
However, I have not checked the history to confirm my suspicion.

OK for mainline?

Tobias

PS: I have the feeling that there might be an issue with 
finalization/derived-type
handling in case of 'realloc' as I did not spot finalization code between the 
size
check and the malloc/realloc. The malloc case should be fine – but if realloc 
shrinks
the memory, elements beyond the new last element in storage order would access 
invalid
memory. – However, I have not checked whether there is indeed a problem as
I concentrated on fixing this issue.

PPS: I lost track of pending patches. Are they any which I should review?
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

The check whether reallocation on assignment was required did not handle
kind=4 characters correctly such that there was always a reallocation,
implying issues with pointer addresses and lower bounds.  Additionally,
with all deferred strings, the old memory was not freed on reallocation.
And, finally, inside the block which was only executed if string lengths
or bounds or dynamic types changed, was a subcheck of the same, which
was effectively a no op but still confusing and at least added with -O0
extra instructions to the binary.

	PR fortran/107508

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_alloc_allocatable_for_assignment): Fix
	string-length check, plug memory leak, and avoid generation of
	effectively no-op code.
	* trans-expr.cc (alloc_scalar_allocatable_for_assignment): Extend
	comment; minor cleanup.

gcc/testsuite/ChangeLog:

	* gfortran.dg/widechar_11.f90: New test.

 gcc/fortran/trans-array.cc| 57 ---
 gcc/fortran/trans-expr.cc |  8 ++---
 gcc/testsuite/gfortran.dg/widechar_11.f90 | 52 
 3 files changed, 62 insertions(+), 55 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 514cb057afb..b7d4c41b5fe 100644
--- a/gcc/fortran/trans-array.cc
+++ 

[Patch] OpenMP/Fortran: 'target update' with DT components (was: [Patch] OpenMP/Fortran: 'target update' with strides + DT components)

2022-11-03 Thread Tobias Burnus

On 03.11.22 13:44, Jakub Jelinek wrote:

[...]
Otherwise LGTM, assuming it actually works correctly.

I don't remember support for non-contiguous copying to/from devices
being actually added, [...] And I think it is not ok to copy bytes
that aren't requested to be copied.


I have now removed that stride support and only kept the bug fix and the
DT component parts of the patch.

The only code change is to remove the stride check disabling in
openmp.cc and in one testcase, to remove the stride part.

I will commit it as attached, unless there are further comments (or the
just started reg testing shows that something does not work).

Tobias

PS: For strides, I now filed: PR middle-end/107517 "[OpenMP][5.0]
'target update' with strides — for C/C++ and Fortran"
https://gcc.gnu.org/PR107517
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: 'target update' with DT components

OpenMP 5.0 permits to use arrays with derived type components for the list
items to the 'from'/'to' clauses of the 'target update' directive.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_clauses): Permit derived types for
	the 'to' and 'from' clauses of 'target update'.
	* trans-openmp.cc (gfc_trans_omp_clauses): Fixes for
	derived-type changes; fix size for scalars.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-11.f90: New test.
	* testsuite/libgomp.fortran/target-13.f90: New test.

 gcc/fortran/openmp.cc   |  10 +-
 gcc/fortran/trans-openmp.cc |   9 +-
 libgomp/testsuite/libgomp.fortran/target-11.f90 |  75 +++
 libgomp/testsuite/libgomp.fortran/target-13.f90 | 159 
 4 files changed, 246 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 653c43f79ff..e0e3b52ad57 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -2499,9 +2499,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  true) == MATCH_YES)
 	continue;
 	  if ((mask & OMP_CLAUSE_FROM)
-	  && gfc_match_omp_variable_list ("from (",
+	  && (gfc_match_omp_variable_list ("from (",
 	  >lists[OMP_LIST_FROM], false,
-	  NULL, , true) == MATCH_YES)
+	  NULL, , true, true)
+		  == MATCH_YES))
 	continue;
 	  break;
 	case 'g':
@@ -3436,9 +3437,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		continue;
 	}
 	  else if ((mask & OMP_CLAUSE_TO)
-	  && gfc_match_omp_variable_list ("to (",
+	  && (gfc_match_omp_variable_list ("to (",
 	  >lists[OMP_LIST_TO], false,
-	  NULL, , true) == MATCH_YES)
+	  NULL, , true, true)
+		  == MATCH_YES))
 	continue;
 	  break;
 	case 'u':
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 9bd4e6c7e1b..4bfdf85cd9b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3626,7 +3626,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  gcc_unreachable ();
 		}
 	  tree node = build_omp_clause (input_location, clause_code);
-	  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
+	  if (n->expr == NULL
+		  || (n->expr->ref->type == REF_ARRAY
+		  && n->expr->ref->u.ar.type == AR_FULL
+		  && n->expr->ref->next == NULL))
 		{
 		  tree decl = gfc_trans_omp_variable (n->sym, false);
 		  if (gfc_omp_privatize_by_reference (decl))
@@ -3666,13 +3669,13 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		{
 		  tree ptr;
 		  gfc_init_se (, NULL);
-		  if (n->expr->ref->u.ar.type == AR_ELEMENT)
+		  if (n->expr->rank == 0)
 		{
 		  gfc_conv_expr_reference (, n->expr);
 		  ptr = se.expr;
 		  gfc_add_block_to_block (block, );
 		  OMP_CLAUSE_SIZE (node)
-			= TYPE_SIZE_UNIT (TREE_TYPE (ptr));
+			= TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ptr)));
 		}
 		  else
 		{
diff --git a/libgomp/testsuite/libgomp.fortran/target-11.f90 b/libgomp/testsuite/libgomp.fortran/target-11.f90
new file mode 100644
index 000..b0faa2e620d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-11.f90
@@ -0,0 +1,75 @@
+! Based on libgomp.c/target-23.c
+
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-final { scan-tree-dump "omp target update to\\(xxs\\\[3\\\] \\\[len: 2\\\]\\)" "original" } }
+! { dg-final { scan-tree-dump "omp target update to\\(s\\.s \\\[len: 4\\\]\\)" "original" } }
+! { dg-final { scan-tree-dump "omp target update from\\(s\\.s \\\[len: 4\\\]\\)" "original" } }
+
+module m
+  implicit none
+  type S_type
+integer s
+integer, pointer :: u(:) => null()
+integer :: v(0:4)
+  end type S_type
+  integer, volatile :: z
+end module m
+
+program main
+  use m
+  implicit none
+  

[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr

2022-11-02 Thread Tobias Burnus

This fixes some an issue with 'alloc:' found when working on the patch
'[Patch] OpenMP/Fortran: 'target update' with strides + DT components'
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604687.html
(BTW: This one is still pending review.)

OK for mainline?

 * * *

I think the patch is a great improvement.

However, again, by writing a testcase, more issues have been found:
* one generic Fortran one, worked around by adding '(:)',
  Cf. https://gcc.gnu.org/PR107508 "Invalid bounds due to bogus reallocation
  on assignment with KIND=4 characters".
* Some other string issues, some might be generic Fortran issues
* Some issue with pointers - where exit data give an error as
  0x00 and 0x01 kinds are not known by target exit data
  Those also showed up with the 'target update' patch mentioned above.

For the last two, I used '#if 0' followed by a comment with the current
error message. I do intent to look into those - or at least file a PR.
Likewise for the remaining issues mentioned in the 'tagret update' patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr

When using 'map(alloc: var, dt%comp)' needs to have a 'to' mapping of
the array descriptor as otherwise the bounds are not available in the
target region. - Likewise for character strings.

This patch implements this; however, some additional issues are exposed
by the testcase; those are '#if 0'ed and will be handled later.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses): Ensure DT struct-comp with
	array descriptor and 'alloc:' have the descriptor mapped with 'to:'.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.

 gcc/fortran/trans-openmp.cc   |3 
 libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 |  567 ++
 2 files changed, 569 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4bfdf85cd9b..4eb9d4c9edc 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3507,7 +3507,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 			= gfc_full_array_size (block, inner, rank);
 			  tree elemsz
 			= TYPE_SIZE_UNIT (gfc_get_element_type (type));
-			  if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node)))
+			  if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node))
+			  || OMP_CLAUSE_MAP_KIND (node) == GOMP_MAP_ALLOC)
 			map_kind = GOMP_MAP_TO;
 			  else if (n->u.map_op == OMP_MAP_RELEASE
    || n->u.map_op == OMP_MAP_DELETE)
diff --git a/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90
new file mode 100644
index 000..1fe3f03c7b8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90
@@ -0,0 +1,567 @@
+! { dg-additional-options "-cpp" }
+
+! FIXME: Some tests do not work yet. Those are for now in '#if 0'
+
+! Check that 'map(alloc:' properly works with
+! - deferred-length character strings
+! - arrays with array descriptors
+! For those, the array descriptor / string length must be mapped with 'to:'
+
+program main
+implicit none
+
+type t
+  integer :: ic(2:5), ic2
+  character(len=11) :: ccstr(3:4), ccstr2
+  character(len=11,kind=4) :: cc4str(3:7), cc4str2
+  integer, pointer :: pc(:), pc2
+  character(len=:), pointer :: pcstr(:), pcstr2
+  character(len=:,kind=4), pointer :: pc4str(:), pc4str2
+end type t
+
+type(t) :: dt
+
+integer :: ii(5), ii2
+character(len=11) :: clstr(-1:1), clstr2
+character(len=11,kind=4) :: cl4str(0:3), cl4str2
+integer, pointer :: ip(:), ip2
+integer, allocatable :: ia(:), ia2
+character(len=:), pointer :: pstr(:), pstr2
+character(len=:), allocatable :: astr(:), astr2
+character(len=:,kind=4), pointer :: p4str(:), p4str2
+character(len=:,kind=4), allocatable :: a4str(:), a4str2
+
+
+allocate(dt%pc(5), dt%pc2)
+allocate(character(len=2) :: dt%pcstr(2))
+allocate(character(len=4) :: dt%pcstr2)
+
+allocate(character(len=3,kind=4) :: dt%pc4str(2:3))
+allocate(character(len=5,kind=4) :: dt%pc4str2)
+
+allocate(ip(5), ip2, ia(8), ia2)
+allocate(character(len=2) :: pstr(-2:0))
+allocate(character(len=4) :: pstr2)
+allocate(character(len=6) :: astr(3:5))
+allocate(character(len=8) :: astr2)
+
+allocate(character(len=3,kind=4) :: p4str(2:4))
+allocate(character(len=5,kind=4) :: p4str2)
+allocate(character(len=7,kind=4) :: a4str(-2:3))
+allocate(character(len=9,kind=4) :: a4str2)
+
+
+! integer :: ic(2:5), ic2
+
+!$omp target enter data map(alloc: dt%ic)
+!$omp target map(alloc: dt%ic)
+  if (size(dt%ic) /= 4) error stop
+  if (lbound(dt%ic, 1) /= 2) error stop
+  if (ubound(dt%ic, 1) /= 5) error stop
+  dt%ic = 

[Patch] OpenMP/Fortran: 'target update' with strides + DT components

2022-10-31 Thread Tobias Burnus

I recently saw that gfortran does not support derived type components
with 'target update', an OpenMP 5.0 feature.

When adding it, I also found out that strides where not handled. There
is probably some room of improvement about what to copy and what not,
but copying too much should be fine.

Build + (reg)tested on x86_64-gnu-linux without offloading configured
+ libgomp tested on x86_64-gnu-linux with nvptx offloading.
OK for mainline?

 * * *

PS: Follow-up work items:
* Strides: OpenMP seemingly permits also 'a%b([1,6,19,12])' as
  long as the first index has the lowest address. – And also
  'a%b(:)%c' is permitted – both not handled in this patch
  (and rejected with a compile-time error)
* There seems to be some problems with 'alloc' with pointers
  and allocatables in components – but I have not rechecked.
* For allocatables, 'target update' needs to do a deep mapping;
  I need to check whether that's the case.
Note for the last two: allocatable components only works OG11/OG12
and I urgently need to cleanup + (re)submit that patch to mainline.
(It came too late for GCC 12.)

* There might be also some issue mapping/refcounting, which I have not
  investigated - affecting the 'target exit data' of target-11.f90.

PPS: I intent to file at least one/some PRs about those issues, unless
I can fix them quickly.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: 'target update' with strides + DT components

OpenMP 5.0 permits to use arrays with strides and derived
type components for the list items to the 'from'/'to' clauses
of the 'target update' directive.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_clauses): Permit derived types.
	(resolve_omp_clauses):Accept noncontiguous
	arrays.
	* trans-openmp.cc (gfc_trans_omp_clauses): Fixes for
	derived-type changes; fix size for scalars.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-11.f90: New test.
	* testsuite/libgomp.fortran/target-13.f90: New test.

 gcc/fortran/openmp.cc   |  19 ++-
 gcc/fortran/trans-openmp.cc |   9 +-
 libgomp/testsuite/libgomp.fortran/target-11.f90 |  75 +++
 libgomp/testsuite/libgomp.fortran/target-13.f90 | 162 
 4 files changed, 256 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 653c43f79ff..2daed74be72 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -2499,9 +2499,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  true) == MATCH_YES)
 	continue;
 	  if ((mask & OMP_CLAUSE_FROM)
-	  && gfc_match_omp_variable_list ("from (",
+	  && (gfc_match_omp_variable_list ("from (",
 	  >lists[OMP_LIST_FROM], false,
-	  NULL, , true) == MATCH_YES)
+	  NULL, , true, true)
+		  == MATCH_YES))
 	continue;
 	  break;
 	case 'g':
@@ -3436,9 +3437,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		continue;
 	}
 	  else if ((mask & OMP_CLAUSE_TO)
-	  && gfc_match_omp_variable_list ("to (",
+	  && (gfc_match_omp_variable_list ("to (",
 	  >lists[OMP_LIST_TO], false,
-	  NULL, , true) == MATCH_YES)
+	  NULL, , true, true)
+		  == MATCH_YES))
 	continue;
 	  break;
 	case 'u':
@@ -7585,8 +7587,11 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 			   Only raise an error here if we're really sure the
 			   array isn't contiguous.  An expression such as
 			   arr(-n:n,-n:n) could be contiguous even if it looks
-			   like it may not be.  */
+			   like it may not be.
+			   And OpenMP's 'target update' permits strides for
+			   the to/from clause. */
 			if (code->op != EXEC_OACC_UPDATE
+			&& code->op != EXEC_OMP_TARGET_UPDATE
 			&& list != OMP_LIST_CACHE
 			&& list != OMP_LIST_DEPEND
 			&& !gfc_is_simply_contiguous (n->expr, false, true)
@@ -7630,7 +7635,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 			int i;
 			gfc_array_ref *ar = >u.ar;
 			for (i = 0; i < ar->dimen; i++)
-			  if (ar->stride[i] && code->op != EXEC_OACC_UPDATE)
+			  if (ar->stride[i]
+			  && code->op != EXEC_OACC_UPDATE
+			  && code->op != EXEC_OMP_TARGET_UPDATE)
 			{
 			  gfc_error ("Stride should not be specified for "
 	 "array section in %s clause at %L",
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 9bd4e6c7e1b..4bfdf85cd9b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3626,7 +3626,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  gcc_unreachable ();
 		}
 	  tree node = build_omp_clause (input_location, clause_code);
-	  if (n->expr == NULL || n->expr->ref->u.ar.type 

Re: [PATCH] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)

2022-10-26 Thread Tobias Burnus

Hi Julian,

I had a first quick lock at this patch, I should have a closer look
later. However, I stumbled over the following:

On 20.10.22 18:14, Julian Brown wrote:

typedef struct gfc_symbol
{
...
   struct gfc_symbol *old_symbol;

   unsigned mark:1, comp_mark:1, data_mark:1, dev_mark:1, gen_mark:1;
   unsigned reduc_mark:1, gfc_new:1;

   struct gfc_symbol *tlink;

   unsigned equiv_built:1;
   ...

I know that this was the case before, but can you move the mark:1 etc.
after 'tlink'? In that case all bitfields are grouped together. If I
have not miscounted, we have currently 7 bits before and 9 bits after
'tlink' and grouping them together reduced pointless padding.

* * *

+  else if (n->sym->mark)
+ gfc_error ("Symbol %qs present on both data and map clauses "
+"at %L", n->sym->name, >where);


I wonder whether that also rejects the following – which seems to be
valid. The 'map' goes to 'target' and the 'firstprivate' to 'parallel',
cf. OpenMP 5.2, "17.2 Clauses on Combined and Composite Constructs",
[340:3-4 & 12-14]. (BTW: While some fixes went into 5.1 regarding this section,
a likewise wording is already in 5.0.)

(Testing showed: it give an ICE without the patch and an error with.)

module m
  integer :: a = 1
end module m

module m2
contains
subroutine bar()
  use m
  !$omp declare target
  a = a + 5
end subroutine bar
end

program p
  use m
  !$omp target parallel do map(a) firstprivate(a)
do i = 1, 1
   a = 7
  call bar()
   if (a /= 7) error stop 1
   a = a + 8

   end do
  if (a /= 6) error stop
end

 * * *

The ICE seems to be because gcc/fortran/trans-openmp.cc's gfc_split_omp_clauses
mishandles this as the dump shows the following:

  #pragma omp target firstprivate(a) map(tofrom:a)
#pragma omp parallel firstprivate(a)

 * * *

In contrast, for the C testcase:

void foo(int x) {
#pragma omp target parallel for simd map(x) firstprivate(x)
for (int k = 0; k < 1; ++k)
  x = 1;
}

the dump is as follows, which seems to be sensible:

  #pragma omp target map(tofrom:x)
#pragma omp parallel firstprivate(x)
  #pragma omp for nowait
#pragma omp simd

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-24 Thread Tobias Burnus

Hi Tobias!

On 24.10.22 21:11, Thomas Schwinge wrote:

On 2022-10-24T21:05:46+0200, I wrote:

On 2022-10-24T16:07:25+0200, Jakub Jelinek via Gcc-patches 
 wrote:

On Wed, Oct 12, 2022 at 10:55:26AM +0200, Tobias Burnus wrote:

libgomp/nvptx: Prepare for reverse-offload callback handling

Well.
 +  struct rev_offload *rev_data;
... but as far as I can tell, this is never initialized in
'nvptx_open_device', which does 'ptx_dev = GOMP_PLUGIN_malloc ([...]);'.
Would the following be the correct fix (currently testing)?

 --- libgomp/plugin/plugin-nvptx.c
 +++ libgomp/plugin/plugin-nvptx.c
 @@ -546,6 +546,8 @@ nvptx_open_device (int n)
ptx_dev->omp_stacks.size = 0;
pthread_mutex_init (_dev->omp_stacks.lock, NULL);

 +  ptx_dev->rev_data = NULL;
 +
return ptx_dev;
  }


LGTM and I think it is obvious – albeit I am not sure why it did not
fail when testing it here.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


*ping* Re: [Patch] OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]

2022-10-24 Thread Tobias Burnus

Ping this patch – and also "Re: [Patch][v5] libgomp/nvptx: Prepare for
reverse-offload callback handling".

For the latter cf. Alexander's code approval
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603908.html – and
his concerns regarding the generic feature in
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601959.html (I
think 'target nowait' permits what he thinks is the better way for GPUs.)

Tobias

On 18.10.22 21:27, Tobias Burnus wrote:

Found when playing around with reverse offload once I used 'omp target
parallel'.
The other issue showed up when running the testsuite (which is done
with -O2).

In all cases, the ICE is in expand_GOMP_TARGET_REV of this IFN, which
should
be unreachable

Note: ENABLE_OFFLOADING inside the compiler must evaluate to true to
show up
as ICE - otherwise, the IFN is not even generated.

I did not see a good reason for DECL_CONTEXT = NULL, thus, I now set
it to
the same as was set for child_fn - for no good reason.

Tested on x86-64 with ENABLE_OFFLOADING albeit without true offloading.
OK for mainline?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG12] omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG

2022-10-20 Thread Tobias Burnus

Given that omp-oacc-kernels-decompose.cc only exists on OG12, the fix
only applies to OG12.

The fail show up since "Kernels loops annotation: C and C++." as that
adds GIMPLE_DEBUG which is not handled in omp-oacc-kernels-decompose.cc
at all. (Actually, it even fails with a sorry when compiling with -g2;
however, -fcompare-debug is supported and was failing.) – For details
see patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 807b755357c4eb03260d229f4a851009fe058e51
Author: Tobias Burnus 
Date:   Thu Oct 20 19:20:36 2022 +0200

omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG

GIMPLE_DEBUG were put in a parallel region of its own, which is not
only pointless but also breaks -fcompare-debug. With this commit,
they are handled like simple assignments: those placed are places
into the same body as the loop such that only one parallel region
remains as without debugging. This fixes the existing testcase
libgomp.oacc-c-c++-common/kernels-loop-g.c.

Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they
appear otherwise, decompose_kernels_region_body rejects them with
a sorry (unchanged).

gcc/
* omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt,
decompose_kernels_region_body): Handle GIMPLE_DEBUG like
simple assignment.
---
 gcc/omp-oacc-kernels-decompose.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 4e940c1ee0f..a7e3d764d52 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -120,7 +120,8 @@ top_level_omp_for_in_stmt (gimple *stmt)
 	  for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next ())
 	{
 	  gimple *body_stmt = gsi_stmt (gsi);
-	  if (gimple_code (body_stmt) == GIMPLE_ASSIGN)
+	  if (gimple_code (body_stmt) == GIMPLE_ASSIGN
+		  || gimple_code (body_stmt) == GIMPLE_DEBUG)
 		continue;
 	  else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR
 		   && gsi_one_before_end_p (gsi))
@@ -1398,7 +1399,7 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
 	= (gimple_code (stmt) == GIMPLE_ASSIGN
 	   && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
 	   && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
-	  if (!is_simple_assignment)
+	  if (!is_simple_assignment && gimple_code (stmt) != GIMPLE_DEBUG)
 	only_simple_assignments = false;
 	}
 }


[OG12] libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=

2022-10-20 Thread Tobias Burnus

Follow up to the mainline commit (https://gcc.gnu.org/r13-3407 + backported to 
OG12):
"libgomp: Add offload_device_gcn check, add requires-4a.c test"

This xfails requires-4.c on pseudo-USM systems.

As mentioned in the email for that patch OG12's unified-share memory
implemention is for pseudo-USM systems where only specially allocated
memory (managed, pinned) is device accessible. - Thus, requires4.c
failed as it used static memory. (requires4a.c works as it uses
heap-allocated memory.)

Tobias

PS: For USM in mainline, see patch submission at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 0c47ae1c9283a812f832e80e451bfa82519c21e8
Author: Tobias Burnus 
Date:   Thu Oct 20 13:25:25 2022 +0200

libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=

The USM implementation uses -foffload-memory=... which allocates variables
in a special memory. This does not support static variables. Hence, XFAIL
this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses
hash memory instead.

libgomp/
* testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on
nvptx and gcn.
---
 libgomp/testsuite/libgomp.c-c++-common/requires-4.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
index 5883eff0d93..c6b28d5442f 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -2,6 +2,8 @@
 /* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
 /* { dg-additional-sources requires-4-aux.c } */
 
+/* { dg-xfail-run-if "USM via -foffload-memory=... does not support static variables" { offload_device_nvptx || offload_device_gcn } } */
+
 /* Check no diagnostic by device-compiler's or host compiler's lto1.
Other file uses: 'requires reverse_offload', but that's inactive as
there are no declare target directives, device constructs nor device routines  */


[OG12][committed] Fix omp-expand.cc's expand_omp_target for OpenACC

2022-10-19 Thread Tobias Burnus

Fallout of my Fortran deep-mapping patch, which I somehow missed –
probably because being inundated by  the OG11 OpenACC fails back then.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 0d6fc5032c7ba8a95301d0ccbc418875e73955ac
Author: Tobias Burnus 
Date:   Wed Oct 19 17:31:14 2022 +0200

Fix omp-expand.cc's expand_omp_target for OpenACC

In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c,
"Fortran/OpenMP: Support mapping of DT with allocatable components"
the size of the addr/sizes/kind arrays was passed as 4th argument.
However, OpenACC uses >3 arguments for its own purpose, e.g. to
handle noncontiguous arrays by passing an array descriptor there.

This patch restores the previous behaviour for OpenACC, fixing
testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c.

gcc/
* omp-expand.cc (expand_omp_target): Fix OpenACC in case there
are more than 3 arguments to the builtin function.
---
 gcc/ChangeLog.omp | 5 +
 gcc/omp-expand.cc | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 527a9850dba..32a8c7b485f 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2022-10-19  Tobias Burnus  
+
+	* omp-expand.cc (expand_omp_target): Fix OpenACC in case there
+	are more than 3 arguments to the builtin function.
+
 2022-10-17  Thomas Schwinge  
 
 	Backported from master:
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 92996685d41..6529f63362b 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -10456,7 +10456,7 @@ expand_omp_target (struct omp_region *region)
   t3 = t2;
   t4 = t2;
 }
-  else if (TREE_VEC_LENGTH (t) == 3)
+  else if (TREE_VEC_LENGTH (t) == 3 || is_gimple_omp_oacc (entry_stmt))
 {
   t1 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (TREE_VEC_ELT (t, 1;
   t1 = size_binop (PLUS_EXPR, t1, size_int (1));

commit 92b14810a2743594df945dc6479413a3d9d943aa
Author: Tobias Burnus 
Date:   Wed Oct 19 17:26:34 2022 +0200

ChangeLog for "Fortran: Fix delinearization regression"

Missed to update gcc/fortran/ChangeLog.omp and to include the
following in previous commit, i.e.
commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be.

gcc/fortran/ChangeLog:

* trans-array.cc (non_negative_strides_array_p): Fix handling
of GFC_DECL_SAVED_DESCRIPTOR.
(gfc_conv_array_ref): Use ARRAY_REF again when possible.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version,
update one scan-tree item.
* gfortran.dg/gomp/depend-4.f90: Revert to upstream version.
* gfortran.dg/gomp/depend-5.f90: Likewise.
* gfortran.dg/gomp/depend-6.f90: Likewise.
---
 gcc/fortran/ChangeLog.omp   | 6 ++
 gcc/testsuite/ChangeLog.omp | 8 
 2 files changed, 14 insertions(+)

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 685fe68667a..189431df4eb 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,9 @@
+2022-10-19  Tobias Burnus  
+
+	* trans-array.cc (non_negative_strides_array_p): Fix handling
+	of GFC_DECL_SAVED_DESCRIPTOR.
+	(gfc_conv_array_ref): Use ARRAY_REF again when possible.
+
 2022-10-17  Tobias Burnus  
 
 	Backport from mainline:
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index b2b4381e3ce..6928d520c0f 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-10-19  Tobias Burnus  
+
+	* gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version,
+	update one scan-tree item.
+	* gfortran.dg/gomp/depend-4.f90: Revert to upstream version.
+	* gfortran.dg/gomp/depend-5.f90: Likewise.
+	* gfortran.dg/gomp/depend-6.f90: Likewise.
+
 2022-10-17  Tobias Burnus  
 
 	Backport from mainline:


[OG12][committed] Fortran: Fix delinearization regression

2022-10-19 Thread Tobias Burnus

As mentioned in the patch submission for "Fortran: Fix
non_negative_strides_array_p", there were some issues on OG12 which uses
Sandra's delinearization patch (and was forward ported from OG11)

This patch fixes one issue, caused by a GCC 12 change.

At some point, we could think of using the delinearization patch in
mainline; on OG12 it is used together with some Graphite work to
parallelize loops in OpenACC's kernels construct. But at least in
principle, it could also offer better optimization options in general.

Tobias

PS: OG12 alias devel/omp/gcc-12 is a branch based on GCC 12 that
contains OpenMP, OpenACC and offloading commits not yet on GCC 12.
Several of those commits went first to mainline/GCC 13, but some are so
far only on OG12 – like this delinearization patch. The goal is to
eventually have all features in mainline.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be
Author: Tobias Burnus 
Date:   Wed Oct 19 15:53:25 2022 +0200

Fortran: Fix delinearization regression

The delinearization patch "Fortran: delinearize multi-dimensional array
accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses
gfc_build_array_ref for the non-delinearization path. The generated
code depends on whether there can be negative strides or not, an
addition to that function in r12-8230-g7964ab6c364 - adding a Boolean
argument.

The follow-up OG12 commit "Fix Fortran array-access regressions",
9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument
to the call in gfc_conv_array_ref, but always evaluating as false.

This commit changes it to a call to non_negative_strides_array_p
(Note: for 'se->expr' not 'base'; the former could be 'arraydesc'
while the later is then 'arraydesc.data' whose TREE_TYPE does not
contain information about the array type.)

However, doing so revealed a bug in non_negative_strides_array_p,
fixed in this commit but also submitted as "Fortran: Fix
non_negative_strides_array_p" to mainline,
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html

As a side effect of this commit, several testcases now pass and the
OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90
could be undone, except that the latter now uses the delinearized
array syntax in one case, which is an improvement (as honored in
the scan-dump-tree). Hence, this commit (partially) reverts the
commits:

21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump
014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90}
2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update
d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update

The main testcase for non_negative_strides_array_p is
gfortran.dg/array_reference_3.f90, which now also passes as well.

Additionally, this changes prevents some unintended implicit
mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed
before - and now passes again.
---
 gcc/fortran/trans-array.cc | 18 --
 .../gfortran.dg/gomp/affinity-clause-1.f90 |  6 +-
 gcc/testsuite/gfortran.dg/gomp/depend-4.f90| 74 +++---
 gcc/testsuite/gfortran.dg/gomp/depend-5.f90| 13 ++--
 gcc/testsuite/gfortran.dg/gomp/depend-6.f90| 72 ++---
 5 files changed, 91 insertions(+), 92 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index bc2477e4aea..13d92c9fb1f 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3703,11 +3703,16 @@ non_negative_strides_array_p (tree expr)
 
   /* If the array was originally a dummy with a descriptor, strides can be
  negative.  */
-  if (DECL_P (expr)
-  && DECL_LANG_SPECIFIC (expr)
-  && GFC_DECL_SAVED_DESCRIPTOR (expr)
-  && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
-return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr));
+  tree decl = expr;
+  STRIP_NOPS (decl);
+  if (TREE_CODE (decl) == INDIRECT_REF)
+decl = TREE_OPERAND (decl, 0);
+
+  if (DECL_P (decl)
+  && DECL_LANG_SPECIFIC (decl)
+  && GFC_DECL_SAVED_DESCRIPTOR (decl)
+  && GFC_DECL_SAVED_DESCRIPTOR (decl) != expr)
+return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (decl));
 
   return true;
 }
@@ -4200,12 +4205,13 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 {
   /* Build a linearized array reference using the offset from all
 	 dimensions.  */
+  bo

[Patch] Fortran: Fix non_negative_strides_array_p

2022-10-19 Thread Tobias Burnus

First, I am woefully aware that there several patches pending. I hope to do a
couple of reviews later today or in the next days.

Otherwise, I did run into another issue in existing code which was exposed by
the delinearization patch on the OG12 branch, but could potentially lead to
wrong code on mainline as well, depending on how the return value is used.
Albeit I did fail to create a testcase for it.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix non_negative_strides_array_p

The non_negative_strides_array_p function might return wrongly 'true', e.g.
for assumed-shape arrays, if the argument is '*a.0 ...' instead of 'a.0 ...'
as then the saved array descriptor for the PARAM_DECL 'a' is not found.

This potentially leads to wrong code - but I could not find a testcase
leading to wrong code on mainline. Asserts show that this happens with
CLASS; however, for those no ARRAY_REF seems to get used.

The issue show up when applying the delinearization patch as posted
at https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562230.html
that has been applied to the OG12 alias devel/omp/gcc-12 vendor branch, as
commit 39a8c371fda6136cf77c74895a00b136409e0ba3. This patch calls
gfc_build_array_ref inside gfc_conv_array_ref. The issue mentioned
above show up with this patch in gfortran.dg/array_reference_3.f90,
a testcase added together with non_negative_strides_array_p in commit
r12-8230-g7964ab6c364 for PR 102043. Here, non_negative_strides_array_p
returns true for assumed_shape_x but assumed shape arrays may have
negative strides.

gcc/fortran/ChangeLog:

	* trans-array.cc (non_negative_strides_array_p): Fix handling
	of GFC_DECL_SAVED_DESCRIPTOR.

 gcc/fortran/trans-array.cc | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 795ce14af08..ca3503b7cae 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3695,11 +3695,16 @@ non_negative_strides_array_p (tree expr)
 
   /* If the array was originally a dummy with a descriptor, strides can be
  negative.  */
-  if (DECL_P (expr)
-  && DECL_LANG_SPECIFIC (expr)
-  && GFC_DECL_SAVED_DESCRIPTOR (expr)
-  && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
-return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr));
+  tree decl = expr;
+  STRIP_NOPS (decl);
+  if (TREE_CODE (decl) == INDIRECT_REF)
+decl = TREE_OPERAND (decl, 0);
+
+  if (DECL_P (decl)
+  && DECL_LANG_SPECIFIC (decl)
+  && GFC_DECL_SAVED_DESCRIPTOR (decl)
+  && GFC_DECL_SAVED_DESCRIPTOR (decl) != expr)
+return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (decl));
 
   return true;
 }


[Patch] OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]

2022-10-18 Thread Tobias Burnus

Found when playing around with reverse offload once I used 'omp target 
parallel'.
The other issue showed up when running the testsuite (which is done with -O2).

In all cases, the ICE is in expand_GOMP_TARGET_REV of this IFN, which should
be unreachable

Note: ENABLE_OFFLOADING inside the compiler must evaluate to true to show up
as ICE - otherwise, the IFN is not even generated.

I did not see a good reason for DECL_CONTEXT = NULL, thus, I now set it to
the same as was set for child_fn - for no good reason.

Tested on x86-64 with ENABLE_OFFLOADING albeit without true offloading.
OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]

For 'target parallel' and similarly nested directives, cgraph_node's
calls_declare_variant_alt was not set in the parent region node but in
cfun->decl. Hence, pass_omp_device_lower did not process handle the
internal function GOMP_TARGET_REV. - Solution is to set it to the
DECL_CONTEXT, which is set in adjust_context_and_scope.

The cgraph_node::create_clone issue is exposed with -O2 for the existing
libgomp.fortran/reverse-offload-1.f90.

omp-offload.cc

	PR middle-end/107236

gcc/ChangeLog:
	* omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt
	in DECL_CONTEXT and not to cfun->decl.
	* cgraphclones.cc (cgraph_node::create_clone): Copy also the
	node's calls_declare_variant_alt value.

gcc/testsuite/ChangeLog:
	* gfortran.dg/gomp/target-device-ancestor-6.f90: New test.

 gcc/cgraphclones.cc |  1 +
 gcc/omp-expand.cc   | 13 ++---
 .../gfortran.dg/gomp/target-device-ancestor-6.f90   | 17 +
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index eb0fa87b554..bb4b3c5407d 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -375,6 +375,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   if (!new_inlined_to)
 prof_count = count.combine_with_ipa_count (prof_count);
   new_node->count = prof_count;
+  new_node->calls_declare_variant_alt = this->calls_declare_variant_alt;
 
   /* Update IPA profile.  Local profiles need no updating in original.  */
   if (update_original)
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 5dc0bf16e17..c636a174e36 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -10054,13 +10054,8 @@ expand_omp_target (struct omp_region *region)
 
   /* Handle the case that an inner ancestor:1 target is called by an outer
 	 target region. */
-  if (!is_ancestor)
-	cgraph_node::get (child_fn)->calls_declare_variant_alt
-	  |= cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
-  else  /* Duplicate function to create empty nonhost variant. */
+  if (is_ancestor)
 	{
-	  /* Enable pass_omp_device_lower pass.  */
-	  cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
 	  cgraph_node *fn2_node;
 	  child_fn2 = build_decl (DECL_SOURCE_LOCATION (child_fn),
   FUNCTION_DECL,
@@ -10074,7 +10069,7 @@ expand_omp_target (struct omp_region *region)
 	  TREE_PUBLIC (child_fn2) = 0;
 	  DECL_UNINLINABLE (child_fn2) = 1;
 	  DECL_EXTERNAL (child_fn2) = 0;
-	  DECL_CONTEXT (child_fn2) = NULL_TREE;
+	  DECL_CONTEXT (child_fn2) = DECL_CONTEXT (child_fn);
 	  DECL_INITIAL (child_fn2) = make_node (BLOCK);
 	  BLOCK_SUPERCONTEXT (DECL_INITIAL (child_fn2)) = child_fn2;
 	  DECL_ATTRIBUTES (child_fn)
@@ -10098,6 +10093,10 @@ expand_omp_target (struct omp_region *region)
 	  fn2_node->force_output = 1;
 	  node->offloadable = 0;
 
+	  /* Enable pass_omp_device_lower pass.  */
+	  fn2_node = cgraph_node::get (DECL_CONTEXT (child_fn));
+	  fn2_node->calls_declare_variant_alt = 1;
+
 	  t = build_decl (DECL_SOURCE_LOCATION (child_fn),
 			  RESULT_DECL, NULL_TREE, void_type_node);
 	  DECL_ARTIFICIAL (t) = 1;
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90
new file mode 100644
index 000..821e7852e85
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90
@@ -0,0 +1,17 @@
+! PR middle-end/107236
+
+! Did ICE before because IFN .GOMP_TARGET_REV was not
+! processed in omp-offload.cc.
+! Note: Test required ENABLE_OFFLOADING being true inside GCC.
+
+implicit none
+!$omp requires reverse_offload
+!$omp target parallel num_threads(4)
+  !$omp target device(ancestor:1)
+call foo()
+  !$omp end target 
+!$omp end target parallel
+contains
+  subroutine foo
+  end
+end


*ping* / Re: [Patch] libgomp: Add offload_device_gcn check, add requires-4a.c test

2022-10-17 Thread Tobias Burnus



On 12.10.22 16:05, Tobias Burnus wrote:

This came up because the USM implementation with
-foffload-memory={unified,pinned}
as posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
does not handle USM with static variables.

This shows up for the OG12 alias devel/omp/gcc-12 branch as FAIL for
requires-4.c.

The attached patch prepares for skipping requires-4.c for the
gcn/nvptx device
and adds an adjacent requires-4a.c testcase, using heap memory, that
can still
run on gcn/nvptx.

Additionally, I commented on no longer used #defined, following the
precedence GOMP_DEVICE_HOST_NONSHM.

Thus, this tests adds another testcase and one effective-target check,
out-comments a unused #define - and that's it.
(Otherwise, it is just a prep patch.)

OK for mainline?

Tobias

PS: Currently, neither the preexisting offload_device_nvptx nor the new
offload_device_gcn target selector is used, neither in old code nor by
this patch.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


*ping* / Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-17 Thread Tobias Burnus



On 12.10.22 10:55, Tobias Burnus wrote:

On 11.10.22 13:12, Alexander Monakov wrote:

My understanding is such trickery should not be necessary with
the barrier-based approach, i.e. the sequence of PTX instructions

   st   % plain store
   membar.sys
   st.volatile

should be enough to guarantee that the former store is visible on the
host
before the latter, and work all the way back to sm_20.


If I understand it correctly, you mean:

  GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num;

  __sync_synchronize ();  /* membar.sys */
  asm volatile ("st.volatile.global.u64 [%0], %1;"
: : "r"(addr_struct_fn), "r" (fn) : "memory");


And then directly followed by the busy wait:

  while (__atomic_load_n (_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIRE)
!= 0)
;  /* spin  */

which GCC expands to:

  /* ld.global.u64 %r64,[__gomp_rev_offload_var];
 ld.u64 %r36,[%r64];
 membar.sys;  */

The such updated patch is attached.

(This is the only change + removing the mkoffload.cc part is the only
larger change. Otherwise, it only handles the minor comments by Jakub.
The now removed CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT was used
until commit r10-304-g1f4c5b9bb2eb81880e2bc725435d596fcd2bdfef i.e.
it is a really old left over!)

Otherwise, tested* to work with sm_30 (error by mkoffload, unchanged),
sm_35 and sm_70.

Tobias

*With some added code; until GOMP_OFFLOAD_get_num_devices accepts
GOMP_REQUIRES_UNIFIED_SHARED_MEMORY and GOMP_OFFLOAD_load_image
gets passed a non-NULL for rev_fn_table, the current patch is a no op.

Planned next is the related GCN patch – and the actual change
in libgomp/target.c (+ accepting USM in GOMP_OFFLOAD_get_num_devices)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] Fortran: Fixes for kind=4 characters strings [PR107266]

2022-10-14 Thread Tobias Burnus

Long introduction - but the patch is rather simple: Don't use kind=1
as type where kind=4 should be used.

Long introduction + background, feel free to skip.



This popped up for libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90
which uses kind=4 characters – if Sandra's "Fortran: delinearize 
multi-dimensional
array accesses" patch is applied.

Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562230.html
Used for OG11: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584716.html
On the OG12 alias devel/omp/gcc-12 vendor branch, it is used:
https://gcc.gnu.org/g:39a8c371fda6136cf77c74895a00b136409e0ba3

* * *

For mainline, I did not observe a wrong-code issue at runtime, still:

void frobc (character(kind=4)[1:*_a] * & restrict a, ...
...
static void frobc (character(kind=1) * & restrict, ...

feels odd, i.e. having the definition as kind=4 and the declaration as kind=1.
With the patch, it becomes:

static void frobc (character(kind=4) * & restrict, character(kind=4) * &, ...

 * * *

For the following, questionable code (→ PR107266), it is even worse:

character(kind=4) function f(x) bind(C)
  character(kind=4), value :: x
end

this gives the following, which has the wrong ABI:

character(kind=1) f (character(kind=1) x)
{
  (void) 0;
}

With the patch, it becomes:
  character(kind=4) f (character(kind=4) x)

 * * *

I think that all only exercises the trans-type.cc patch;
the trans-expr.cc code gets called – as an assert shows,
but I fail to get a dump where this goes wrong.

However, for struct-elem-map-1.f90 with mainline or with
OG12 and the patch:
  #pragma omp target map(tofrom:var.uni2[40 / 20] [len: 20])

while on OG12 without the attached patch:
  #pragma omp target map(tofrom:var.uni2[40 / 5] [len: 5])

where the problem is that TYPE_SIZE_UNIT is wrong. Whether
this only affects OG12 due to the delinearizer patch or
some code on mainline as well, I don't know.

Still, I think it should be fixed ...



OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fixes for kind=4 characters strings [PR107266]

	PR fortran/107266

gcc/fortran/
	* trans-expr.cc (gfc_conv_string_parameter): Use passed
	type to honor character kind.
	* trans-types.cc (gfc_sym_type): Honor character kind.
	* trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4
	character strings.

gcc/testsuite/
	* gfortran.dg/char4_decl.f90: New test.
	* gfortran.dg/char4_decl-2.f90: New test.

 gcc/fortran/trans-decl.cc  | 10 ++---
 gcc/fortran/trans-expr.cc  | 12 +++---
 gcc/fortran/trans-types.cc |  2 +-
 gcc/testsuite/gfortran.dg/char4_decl-2.f90 | 59 ++
 gcc/testsuite/gfortran.dg/char4_decl.f90   | 52 ++
 5 files changed, 123 insertions(+), 12 deletions(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 5d16d640322..4b570c3551a 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -7378,13 +7378,13 @@ done:
   /* Set string length for len=:, only.  */
   if (sym->ts.type == BT_CHARACTER && !sym->ts.u.cl->length)
 {
-  tmp = sym->ts.u.cl->backend_decl;
+  tmp2 = gfc_get_cfi_desc_elem_len (cfi);
+  tmp = fold_convert (TREE_TYPE (tmp2), sym->ts.u.cl->backend_decl);
   if (sym->ts.kind != 1)
 	tmp = fold_build2_loc (input_location, MULT_EXPR,
-			   gfc_array_index_type,
-			   sym->ts.u.cl->backend_decl, tmp);
-  tmp2 = gfc_get_cfi_desc_elem_len (cfi);
-  gfc_add_modify (, tmp2, fold_convert (TREE_TYPE (tmp2), tmp));
+			   TREE_TYPE (tmp2), tmp,
+			   build_int_cst (TREE_TYPE (tmp2), sym->ts.kind));
+  gfc_add_modify (, tmp2, tmp);
 }
 
   if (!sym->attr.dimension)
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 1551a2e4df4..e7b9211f17e 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -10374,15 +10374,15 @@ gfc_conv_string_parameter (gfc_se * se)
|| TREE_CODE (TREE_TYPE (se->expr)) == INTEGER_TYPE)
   && TYPE_STRING_FLAG (TREE_TYPE (se->expr)))
 {
+  type = TREE_TYPE (se->expr);
   if (TREE_CODE (se->expr) != INDIRECT_REF)
-	{
-	  type = TREE_TYPE (se->expr);
-  se->expr = gfc_build_addr_expr (build_pointer_type (type), se->expr);
-	}
+	se->expr = gfc_build_addr_expr (build_pointer_type (type), se->expr);
   else
 	{
-	  type = gfc_get_character_type_len (gfc_default_character_kind,
-	 se->string_length);
+	  if (TREE_CODE (type) == ARRAY_TYPE)
+	type = TREE_TYPE (type);
+	  type = gfc_get_character_type_len_for_eltype (type,
+			se->string_length);
 	  type = build_pointer_type (type);
 	  se->expr = 

[committed] gfortran.dg/c-interop/deferred-character-2.f90: Fix dg-do

2022-10-14 Thread Tobias Burnus

Just spotted this. It did only compile instead of also run and was the
only occurrence I could find for 'dg-.*execute'.

Committed as https://gcc.gnu.org/r13-3306

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 3760dd553eed21ac5614cf0d0841ca984b4361e2
Author: Tobias Burnus 
Date:   Fri Oct 14 18:34:49 2022 +0200

gfortran.dg/c-interop/deferred-character-2.f90: Fix dg-do

gcc/testsuite/
* gfortran.dg/c-interop/deferred-character-2.f90: Use 'dg-do run'.

diff --git a/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90 b/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90
index 356097af241..4dab32662c6 100644
--- a/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90
+++ b/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90
@@ -1,5 +1,5 @@
 ! PR 92482
-! { dg-do execute}
+! { dg-do run }
 !
 ! TS 29113
 ! 8.7 Interoperability of procedures and procedure interfaces


[Patch] libgomp: Add Fortran testcases for omp_in_explicit_task

2022-10-13 Thread Tobias Burnus

Rather obvious patch as it is a straight conversion from C.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add Fortran testcases for omp_in_explicit_task

Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/task-7.f90: New test.
	* testsuite/libgomp.fortran/task-8.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
	* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
	* testsuite/libgomp.fortran/task-reduction-18.f90: New test.

 libgomp/testsuite/libgomp.fortran/task-7.f90   |  22 
 libgomp/testsuite/libgomp.fortran/task-8.f90   |  13 +++
 .../libgomp.fortran/task-in-explicit-1.f90 | 113 +
 .../libgomp.fortran/task-in-explicit-2.f90 |  21 
 .../libgomp.fortran/task-in-explicit-3.f90 |  31 ++
 .../libgomp.fortran/task-reduction-17.f90  |  32 ++
 .../libgomp.fortran/task-reduction-18.f90  |  15 +++
 7 files changed, 247 insertions(+)

diff --git a/libgomp/testsuite/libgomp.fortran/task-7.f90 b/libgomp/testsuite/libgomp.fortran/task-7.f90
new file mode 100644
index 000..e806bd79663
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-7.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+
+program main
+  use omp_lib
+  implicit none
+
+  !$omp task final (.true.)
+if (.not. omp_in_final ()) &
+  error stop
+!$omp task
+  if (.not. omp_in_final ()) &
+error stop
+  !$omp target nowait
+  if (omp_in_final ()) &
+error stop
+  !$omp end target
+  if (.not. omp_in_final ()) &
+error stop
+  !$omp taskwait
+!$omp end task
+  !$omp end task
+end
diff --git a/libgomp/testsuite/libgomp.fortran/task-8.f90 b/libgomp/testsuite/libgomp.fortran/task-8.f90
new file mode 100644
index 000..037c63b8fa3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-8.f90
@@ -0,0 +1,13 @@
+! { dg-do run }
+
+program main
+  implicit none
+  integer :: i
+  i = 0
+  !$omp task
+!$omp target nowait private (i)
+  i = 1
+!$omp end target
+!$omp taskwait
+  !$omp end task
+end
diff --git a/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90 b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90
new file mode 100644
index 000..b6fa21b2c22
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90
@@ -0,0 +1,113 @@
+! { dg-do run }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: i
+
+  if (omp_in_explicit_task ()) &
+error stop
+  !$omp task
+  if (.not. omp_in_explicit_task ()) &
+error stop
+  !$omp end task
+
+  !$omp task final (.true.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+  !$omp end task
+
+  !$omp parallel
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+  if (.not. omp_in_explicit_task ()) &
+error stop
+!$omp end task
+!$omp end task
+!$omp task final (.true.)
+  if (.not. omp_in_explicit_task ()) &
+error stop
+!$omp end task
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp taskloop num_tasks (24)
+do i = 1, 32
+  if (.not. omp_in_explicit_task ()) &
+error stop
+end do
+!$omp masked
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp end masked
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+  !$omp end parallel
+
+  !$omp target
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+  !$omp end target
+
+  !$omp target teams
+!$omp distribute
+do i = 1, 4
+  if (omp_in_explicit_task ()) then
+error stop
+  else
+  !$omp parallel
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+  !$omp end parallel
+  end if
+end do
+  !$omp end target teams
+
+  !$omp teams
+!$omp distribute
+

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-10-12 Thread Tobias Burnus

On 12.10.22 19:09, Andrew Stubbs wrote:


On 12/10/2022 15:29, Tobias Burnus wrote:

Right, sorry, the buffer is circular, but the counter is linear. It
simplified reservation that way, but it does mean that there's a limit
to the number of times the buffer can cycle before the counter
saturates. (You'd need to stream out gigabytes of data to hit the
limit though.)

Or in other words, you can have 2^32 = 4,294,967,296 (write chunks +
reverse offloads) per kernel launch.

...

PS: Currently, device stack variables are private and cannot be
accessed from the host; this will change in a separate patch. [...]

So, the patch, as is, is known to be non-functional? How can you have
tested it? For the addrs_sizes_kind data to be accessible the
asm("s8") has to be wrong.


I have tested the non-addrs_sizes_kind part only, which permits to run
reverse-offload functions just fine, but only if they do not use
firstprivate or map. — And I actually also tested with the
addrs_sizes_kind part but that unsurprisingly fails hard when trying to
copy the stack data.


I think the patch looks good, in principle. The use of the existing
ring-buffer is the right way to do it, IMO. Can we get the manually
allocated stacks patch in first and then follow up with these patches
when they actually work?


I stash this patch as: "OK – but ams still want to have a glance once
__builtin_gcn_kernarg_ptr is in".

I terms of having fewer *.diff files around, I of course would prefer to
just change one line in a follow-up commit instead of keeping a full
patch around, but holding off until __builtin_gcn_kernarg_ptr is ready +
the default has changed to non-private stack variables is also fine.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-10-12 Thread Tobias Burnus

On 29.09.22 18:24, Andrew Stubbs wrote:

On 27/09/2022 14:16, Tobias Burnus wrote:

Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance regarding how to actually implement it.

[...]
The point is that you can use the "msg" and "text" fields for whatever
data you want, as long as you invent a new value for "type".
[]
You can make "case 4" do whatever you want. There are enough bytes for
4 pointers, and you could use multiple packets (although it's not safe
to assume they're contiguous or already arrived; maybe "case 4" for
part 1, "case 5" for part 2). It's possible to change this structure,
of course, but the target implementation is in newlib so versioning
becomes a problem.


I think  – also looking at the Newlib write.c implementation - that the
data is contiguous: there is an atomic add, where instead of passing '1'
for a single slot, I could also add '2' for two slots.

Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev,
it needs the generic parts of the sister nvptx patch.*

2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + 32.)
As target_ext is blocking, I decided to use a stack local variable for
the remaining arguments and pass it along. Alternatively, I could also
use 2 slots - and process them together. This would avoid one
device->host memory copy but would make console_output less clear.

OK for mainline?

Tobias

* https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html

PS: Currently, device stack variables are private and cannot be accessed
from the host; this will change in a separate patch. It not only affects
the "rest" part as used in this patch but also the actual arrays behind
addr, kinds, and sizes. And quite likely a lot of the map/firstprivate
variables passed to addr.

As num_devices() will return 0 or -1, this is for now a non-issue.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: Prepare for reverse-offload callback handling

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h: New file; contains
	struct output, declared previously in plugin-gcn.c.
	* config/gcn/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare as extern var.
	(GOMP_target_ext): Handle reverse offload.
	* plugin/plugin-gcn.c: Include libgomp-gcn.h.
	(struct kernargs): Replace struct def by the one
	from libgomp-gcn.h for output_data.
	(process_reverse_offload): New.
	(console_output): Call it.

 libgomp/config/gcn/libgomp-gcn.h | 61 
 libgomp/config/gcn/target.c  | 44 -
 libgomp/plugin/plugin-gcn.c  | 34 --
 3 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
new file mode 100644
index 000..91560be787f
--- /dev/null
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -0,0 +1,61 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Tobias Burnus .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains defines and type definitions shared between the
+   nvptx target's libgomp.a and the plugin-nvptx.c, but that is only
+   needef for this target.  */
+
+#ifndef LIBGOMP_GCN_H
+#define LIBGOMP_GCN_H 1
+
+/* This struct is also used in Newlib's libc/sys/amdgcn/write.c.  */
+struct output
+{
+  int return_value;
+  unsigned int next_output;
+  struct printf_data {
+int written;
+union {
+  char msg[128];
+  uint64_t msg_u64[2];
+};
+int type;
+union {
+  int64_t ivalue;
+  double dvalue;
+  char text[128];
+  uint64_t value_u64[2];
+

[Patch] libgomp: Add offload_device_gcn check, add requires-4a.c test

2022-10-12 Thread Tobias Burnus

This came up because the USM implementation with 
-foffload-memory={unified,pinned}
as posted at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
does not handle USM with static variables.

This shows up for the OG12 alias devel/omp/gcc-12 branch as FAIL for 
requires-4.c.

The attached patch prepares for skipping requires-4.c for the gcn/nvptx device
and adds an adjacent requires-4a.c testcase, using heap memory, that can still
run on gcn/nvptx.

Additionally, I commented on no longer used #defined, following the
precedence GOMP_DEVICE_HOST_NONSHM.

Thus, this tests adds another testcase and one effective-target check,
out-comments a unused #define - and that's it.
(Otherwise, it is just a prep patch.)

OK for mainline?

Tobias

PS: Currently, neither the preexisting offload_device_nvptx nor the new
offload_device_gcn target selector is used, neither in old code nor by this 
patch.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add offload_device_gcn check, add requires-4a.c test

Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but
with using a heap-allocated instead of static memory for a variable.

This change and the added offload_device_gcn check prepare for
pseudo-USM, where the device hardware cannot access all host
memory but only managed and pinned memory; for those, requires-4.c
will fail and the new check permits to add
  target { ! { offload_device_nvptx || offload_device_gcn } }
to requires-4.c; however, it has not been added yet as pseuo-USM
support is not yet on mainline. (Review is pending for the USM
patches.)

include/ChangeLog:

	* gomp-constants.h (GOMP_DEVICE_HSA): Comment (unused).

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn):
	New.
	* testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn,
	on_device_arch_gcn): New.
	* testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from
	requires-4.c but using heap-allocated memory.

 include/gomp-constants.h   |  2 +-
 libgomp/testsuite/lib/libgomp.exp  | 12 +++
 .../libgomp.c-c++-common/on_device_arch.h  | 13 
 .../testsuite/libgomp.c-c++-common/requires-4a.c   | 39 ++
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 84316f953d0..fac7316b858 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -229,9 +229,9 @@ enum gomp_map_kind
 /* #define GOMP_DEVICE_HOST_NONSHM	3 removed.  */
 #define GOMP_DEVICE_NOT_HOST		4
 #define GOMP_DEVICE_NVIDIA_PTX		5
 #define GOMP_DEVICE_INTEL_MIC		6
-#define GOMP_DEVICE_HSA			7
+/* #define GOMP_DEVICE_HSA		7 removed.  */
 #define GOMP_DEVICE_GCN			8
 
 /* We have a compatibility issue.  OpenMP 5.2 introduced
omp_initial_device with value of -1 which clashes with our
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 107a3c2ac9d..4b8c64de8a5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -414,8 +414,20 @@ proc check_effective_target_offload_device_nvptx { } {
 	}
 } ]
 }
 
+# Return 1 if using a GCN offload device.
+proc check_effective_target_offload_device_gcn { } {
+return [check_runtime_nocache offload_device_gcn {
+  #include 
+  #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+  int main ()
+	{
+	  return !on_device_arch_gcn ();
+	}
+} ]
+}
+
 # Return 1 if at least one Nvidia GPU is accessible.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
 return [check_runtime openacc_nvidia_accel_present {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
index f92743b04d7..6f66dbd784c 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
+++ b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
@@ -6,15 +6,22 @@ device_arch_nvptx (void)
 {
   return GOMP_DEVICE_NVIDIA_PTX;
 }
 
+/* static */ int
+device_arch_gcn (void)
+{
+  return GOMP_DEVICE_GCN;
+}
+
 /* static */ int
 device_arch_intel_mic (void)
 {
   return GOMP_DEVICE_INTEL_MIC;
 }
 
 #pragma omp declare variant (device_arch_nvptx) match(construct={target},device={arch(nvptx)})
+#pragma omp declare variant (device_arch_gcn) match(construct={target},device={arch(gcn)})
 #pragma omp declare variant (device_arch_intel_mic) match(construct={target},device={arch(intel_mic)})
 /* static */ int
 device_arch (void)
 {
@@ -36,8 +43,14 @@ on_device_arch_nvptx ()
 {
   return on_device_arch (GOMP_DEVICE_NVIDIA_PTX);
 }
 
+int
+on_device_arch_gcn ()
+{
+  return on_device_arch (GOMP_DEVICE_GCN);
+}
+
 int
 on_device_arch_intel_mic ()
 {
   return 

Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-12 Thread Tobias Burnus
 
 enum {
   CU_STREAM_DEFAULT = 0,
@@ -169,6 +171,7 @@ CUresult cuMemGetInfo (size_t *, size_t *);
 CUresult cuMemAlloc (CUdeviceptr *, size_t);
 #define cuMemAllocHost cuMemAllocHost_v2
 CUresult cuMemAllocHost (void **, size_t);
+CUresult cuMemHostAlloc (void **, size_t, unsigned int);
 CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t);
 #define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2
 CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream);
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 6f869be..eef151c 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -30,7 +30,7 @@
 
 /* This is set to the ICV values of current GPU during device initialization,
when the offload image containing this libgomp portion is loaded.  */
-static volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
+volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
 
 void
 omp_set_default_device (int device_num __attribute__((unused)))
diff --git a/libgomp/config/nvptx/libgomp-nvptx.h b/libgomp/config/nvptx/libgomp-nvptx.h
new file mode 100644
index 000..5da9aae
--- /dev/null
+++ b/libgomp/config/nvptx/libgomp-nvptx.h
@@ -0,0 +1,51 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Tobias Burnus .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains defines and type definitions shared between the
+   nvptx target's libgomp.a and the plugin-nvptx.c, but that is only
+   needef for this target.  */
+
+#ifndef LIBGOMP_NVPTX_H
+#define LIBGOMP_NVPTX_H 1
+
+#define GOMP_REV_OFFLOAD_VAR __gomp_rev_offload_var
+
+struct rev_offload {
+  uint64_t fn;
+  uint64_t mapnum;
+  uint64_t addrs;
+  uint64_t sizes;
+  uint64_t kinds;
+  int32_t dev_num;
+};
+
+#if (__SIZEOF_SHORT__ != 2 \
+ || __SIZEOF_SIZE_T__ != 8 \
+ || __SIZEOF_POINTER__ != 8)
+#error "Data-type conversion required for rev_offload"
+#endif
+
+#endif  /* LIBGOMP_NVPTX_H */
+
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index 11108d2..0e79388 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -24,9 +24,12 @@
<http://www.gnu.org/licenses/>.  */
 
 #include "libgomp.h"
+#include "libgomp-nvptx.h"  /* For struct rev_offload + GOMP_REV_OFFLOAD_VAR. */
 #include 
 
 extern int __gomp_team_num __attribute__((shared));
+extern volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
+volatile struct rev_offload *GOMP_REV_OFFLOAD_VAR;
 
 bool
 GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
@@ -88,16 +91,53 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 		 void **hostaddrs, size_t *sizes, unsigned short *kinds,
 		 unsigned int flags, void **depend, void **args)
 {
-  (void) device;
-  (void) fn;
-  (void) mapnum;
-  (void) hostaddrs;
-  (void) sizes;
-  (void) kinds;
+  static int lock = 0;  /* == gomp_mutex_t lock; gomp_mutex_init (); */
   (void) flags;
   (void) depend;
   (void) args;
-  __builtin_unreachable ();
+
+  if (device != GOMP_DEVICE_HOST_FALLBACK
+  || fn == NULL
+  || GOMP_REV_OFFLOAD_VAR == NULL)
+return;
+
+  gomp_mutex_lock ();
+
+  GOMP_REV_OFFLOAD_VAR->mapnum = mapnum;
+  GOMP_REV_OFFLOAD_VAR->addrs = (uint64_t) hostaddrs;
+  GOMP_REV_OFFLOAD_VAR->sizes = (uint64_t) sizes;
+  GOMP_REV_OFFLOAD_VAR->kinds = (uint64_t) kinds;
+  GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num;
+
+  /* Set 'fn' to trigger processing on the host; wait for completion,
+ which is flagged by setting 'fn' back to 0 on the host.  */
+  uint64_t addr_struct_fn = (uint64_t) _REV_OFFLOAD_VAR->fn;
+#if __PTX_SM__ >= 700
+  asm volatile ("st.global.release.sys.u64 [%0], %1;"
+		: : "r"(addr_struct_fn), "r" (fn) : "memory");
+#else
+  __sync_synchronize ();  /* membar.sys */
+  asm volatile ("st.volatile.global.u64 [%0], %1;"
+

<    1   2   3   4   5   6   7   8   9   10   >