from:"Tobias Burnus"

[Patch] OpenMP: Allocate directive for static vars, clean up

2024-10-04 Thread Tobias Burnus


'omp allocate' permits to use a different (specified) allocator and
alignment for both stack/automatic and static/saved variables; the latter
takes only predefined allocators. Currently, only C and Fortran are
support for stack/automatic variables; static variables are rejected
before the attached patch. (For them, only predefined allocators are
permitted.)

* * *

I happened to look at the 'allocate' directive recently and, doing so,
I stumbled over a couple of issues, which the attached patch addresses
(missing diagnostics for corner cases, not updated checks, unhelpful
documentation ['allocate' *clause*], ...). Doing so, I wondered whether:

Shouldn't we just accept 'omp allocate' for static
variables by just honoring the aligning and ignoring the actually requested
allocator? - First, we do already the same for actual allocations as not all
traits are supported. And for the host this seems to be the most sensible to
do in any case.
[For some use cases, pointers + allocation in the constructor would be
better, but in general, not adding an indirection seems to be better and
has fewer corner-case usability issue.]

I guess we later want to honor the requested memory for nvptx and/or gcn; at
least Nvidia GPUs could make use for constant memory (having advantages for
reading the same memory by many threads/broadcasting it). I guess OpenACC 2.7's
'readonly' modifier serves a similar purpose.
For now we don't, but the attribute is passed on to the backends, which could
make use of them, if desired. ('groupprivate' directive vs. cgroup/thread
allocators are similar device-only features.)

As mentioned, this patch also fixes a few other issues here and there, see
commit log and source code for details.

Code comments? Suggestions or remarks? - Before I apply this patch?

Tobias

PS: I am aware that C++ support is lacking. There is a pending patch that needs
to be updated for this patch, probably some bitrotting, and in particular for 
the
review comments, cf. 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html
and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.html
OpenMP: Allocate directive for static vars, clean up

For the 'allocate' directive, remove the sorry for static variables and
just keep using normal memory, but honor the requested alignment and set
a DECL_ATTRIBUTE in case a target may want to make use of this later on.
The documentation is updated accordingly.

The C diagnostic to check for predefined allocators in this case failed
to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was already
okay; but both now use new common #defined value for checking.)
And while Fortran common block variables are still rejected, the check
has been improved as before the sorry diagnostic did not work for
common blocks in modules.

Finally, for 'allocate' clause on the target/task/taskloop directives,
there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator
with access = thread), which is undefined behavior according to the
OpenMP specification.

And, last, testing showed that var decl + static_assert sets TREE_USED
but does not produce a statement list in C, which did run into an assert
in gimplify. This special case is now also handled.


gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_allocate): Set alignment for alignof;
	accept static variables and fix predef allocator check.

gcc/fortran/ChangeLog:

	* openmp.cc (is_predefined_allocator): Use gomp-constants.h consts.
	* trans-common.cc (translate_common): Reject OpenMP allocate directives.
	* trans-decl.cc (gfc_finish_var_decl): Handle allocate directive
	for static variables.
	(gfc_trans_deferred_vars): Update for the latter.

gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP
	allocate directive.	
	(gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used
	as allocator with the target/task/taskloop directive.

include/ChangeLog:

	* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX,
	GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX,
	GOMP_OMP_PREDEF_ALLOC_THREADS): New defines.

libgomp/ChangeLog:

	* allocator.c: Add static asserts for news
	 GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values.
	* libgomp.texi (OpenMP Impl. Status): Allocate directive for
	static vars is now supported. Refer to PR for allocate clause.
	(Memory allocation): Update for static vars; minor word tweaking.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/allocate-9.c: Update for removed sorry.
	* gfortran.dg/gomp/allocate-15.f90: Likewise.
	* gfortran.dg/gomp/allocate-pinned-1.f90: Likewise.
	* gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for
	previously missing diagnostic.
	* c-c++-common/gomp/allocate-18.c: New test.
	* c-c++-common/gomp/allocate-19.c: New test.
	* gfortran.dg/gomp/allocate-clause.f90: New test.
	* gfortran.dg/gomp/allocate-static-2.f90: New test.
	* gfortran.dg/gomp/allocate-static.f90: New test.

 gcc/c/c-parser.cc  |  29

Re: [PATCH v4 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-10-02 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.


LGTM. Thanks,

Tobias


gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.

[committed] libgomp.texi: Remove now duplicate TR13 item (was: [committed] libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items)

2024-09-27 Thread Tobias Burnus

Continuing reading 
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Technical-Report-13.html 
showed that I missed one old item, which could be now removed:


With the new 'storage' map type it was also no longer fully applicable – 
and the newly added text already covered it.


Committed as Rev. r15-3919-gcfdc0a384aff5e as follow up to 
r15-3917-g6b7eaec20b046e.


* * *

While useful, those tables are unfortunately not very readable. (And I 
wonder how many more non-Appendix B items should be added; it probably 
requires a full go through the changes and will still likely miss 
several important but more hidden changes.)


Tobias
commit cfdc0a384aff5e06f80d3f55f4615abf350b193b
Author: Tobias Burnus 
Date:   Fri Sep 27 12:06:17 2024 +0200

libgomp.texi: Remove now duplicate TR13 item

Remove an item under "Other new TR 13 features" that since the last commit
(r15-3917-g6b7eaec20b046e) to this file is is covered by the added
  "New @code{storage} map-type modifier; context-dependent @code{alloc} and
   @code{release} are aliases"
  "Update of the map-type decay for mapping and @code{declare_mapper}"

libgomp/
* libgomp.texi (TR13 status): Update semi-duplicated, semi-obsoleted
item; remove left-over half-sentence.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index b561cb5f3f4..c6464ece32e 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -511,7 +511,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
   @tab N @tab
 @item @code{ref} modifier to the @code{map} clause @tab N @tab
 @item New @code{storage} map-type modifier; context-dependent @code{alloc} and
-  @code{release} are aliases. Update to map decay @tab N @tab
+  @code{release} are aliases @tab N @tab
 @item Update of the map-type decay for mapping and @code{declare_mapper}
   @tab N @tab
 @item Change of the @emph{map-type} property from @emph{ultimate} to
@@ -633,8 +633,6 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item Multi-word directive names are now permitted with underscore @tab N @tab
 @item In Fortran (fixed + free), space between directive names is mandatory
   @tab N @tab
-@item @code{map(release: ...)} on @code{target} and @code{target_data} (map-type
-  decay changes) @tab N @tab post-TR13 item
 @end multitable

[committed] libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items

2024-09-27 Thread Tobias Burnus

This commitr15-3917-g6b7eaec20b046e updates .texi for one formatting (@emph → 
@code) fix and updates some items for post TR13 changes. (The latter is 
slightly questionable as the title says TR13, which is the third and 
last draft of OpenMP 6.0, scheduled to be released in time for 
Supercomputing 2024 in November - and the listed changes are in the 
current internal draft, only. But on the other hand, post-TR13 work is 
supposed to be mostly QC tasks and 6.0 is due in around 6 weeks. 
Furthermore, when looking at the spec changes for this update, I did 
find an important generator bug, causing text omissions in the spec, 
which is something I would otherwise probably only encountered after the 
spec release.) Tobias
commit 6b7eaec20b046eebc771022e460c2206580aef04
Author: Tobias Burnus 
Date:   Fri Sep 27 10:48:09 2024 +0200

libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items

libgomp/
* libgomp.texi (OpenMP Technical Report 13): Change @emph to @code;
add two post-TR13 OpenMP 6.0 items.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22eff1d7b55..b561cb5f3f4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -476,6 +476,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
   specifiers @tab Y @tab
 @item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab
 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
+@item The @code{adjust_args} clause was extended to specify the argument by position
 @item For Fortran, @emph{locator list} can be also function reference with
   data pointer result @tab N @tab
 @item Concept of @emph{assumed-size arrays} in C and C++
@@ -496,7 +497,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
   clauses @tab P @tab @code{private} not supported
 @item For Fortran, rejecting polymorphic types in data-mapping clauses
   @tab N @tab not diagnosed (and mostly unsupported)
-@item New @code{taskgraph} construct including @emph{saved} modifier and
+@item New @code{taskgraph} construct including @code{saved} modifier and
   @code{replayable} clause @tab N @tab
 @item @code{default} clause on the @code{target} directive @tab N @tab
 @item Ref-count change for @code{use_device_ptr} and @code{use_device_addr}
@@ -509,6 +510,10 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item New @code{init_complete} clause to the @code{scan} directive
   @tab N @tab
 @item @code{ref} modifier to the @code{map} clause @tab N @tab
+@item New @code{storage} map-type modifier; context-dependent @code{alloc} and
+  @code{release} are aliases. Update to map decay @tab N @tab
+@item Update of the map-type decay for mapping and @code{declare_mapper}
+  @tab N @tab
 @item Change of the @emph{map-type} property from @emph{ultimate} to
   @emph{default} @tab N @tab
 @item @code{self} modifier to @code{map} and @code{self} as
@@ -516,7 +521,6 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
   @tab N @tab
 @item @code{delete} as delete-modifier not as map type @tab N @tab
-@item @code{release} map-type modifier in @code{declare_mapper} @tab N @tab
 @item For Fortran, the @code{automap} modifier to the @code{enter} clause
   of @code{declare_target} @tab N @tab
 @item @code{groupprivate} directive @tab N @tab

[committed] libgomp.texi: Fix deprecation note for omp_{get,set}_nested + OMP_NESTED

2024-09-26 Thread Tobias Burnus


While the header files correctly have:

extern void omp_set_nested (int) __GOMP_NOTHROW __GOMP_DEPRECATED_5_0;
extern int omp_get_nested (void) __GOMP_NOTHROW __GOMP_DEPRECATED_5_0;

and for Fortran

#if _OPENMP >= 201811
!GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
...

The documentation wrongly claimed that those were only deprecated in OpenMP 5.2.

Fixed as attached / committed inr15-3900-g9ec258bf65e6ae

Tobias
commit 9ec258bf65e6ae856491f607a987fe15b5385866
Author: Tobias Burnus 
Date:   Thu Sep 26 17:25:34 2024 +0200

libgomp.texi: Fix deprecation note for omp_{get,set}_nested + OMP_NESTED

libgomp/ChangeLog:

* libgomp.texi (omp_get_nested,omp_set_nested, OMP_NESTED): Fix
note about deprecation - correct is 5.0 not 5.2.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 29f5419cd0f..22eff1d7b55 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -937,7 +937,7 @@ active nested regions to the maximum supported.  Disabling nested parallel
 regions sets the maximum number of active nested regions to one.
 
 Note that the @code{omp_set_nested} API routine was deprecated
-in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}.
+in the OpenMP specification 5.0 in favor of @code{omp_set_max_active_levels}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -984,7 +984,7 @@ regions with @code{omp_set_max_active_levels} to one to disable, or
 above one to enable.
 
 Note that the @code{omp_get_nested} API routine was deprecated
-in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
+in the OpenMP specification 5.0 in favor of @code{omp_get_max_active_levels}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -3934,7 +3934,7 @@ setting.  If both are undefined, nested parallel regions are enabled if
 more than one item, otherwise they are disabled by default.
 
 Note that the @code{OMP_NESTED} environment variable was deprecated in
-the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
+the OpenMP specification 5.0 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
 
 @item @emph{See also}:
 @ref{omp_set_max_active_levels}, @ref{omp_set_nested},

Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-26 Thread Tobias Burnus


Now committed as r15-3895-ge4a58b6f28383c.

* * *

Next step is to sent the Fortran part. While it exists, I want to proof 
read what I wrote a couple years back and I want to split-off the 
polymorphism/class part as the current implementation has some issues 
and OpenMP 6 decided to disallow polymorphic Fortran variables for now. 
(Until some corner-case behavior has been defined.)


[The existing polymorphism support works but it effectively only permits 
access to the declared types (as the vtable pointers will be the ones of 
the host), it also has some issues + as the vtable gained two functions, 
the ABI compatibility with old code is gone (+ hence the .mod version 
number was bumped).]


The entry code for the committed patch as mentioned before:

Am 10.09.24 um 12:19 schrieb Tobias Burnus:
The interesting bit are the hook entry points gfc_omp_deep_mapping_p, 
gfc_omp_deep_mapping_cnt, and gfc_omp_deep_mapping → 
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209


And I think all code is in this file, once removing the polymorphism 
code – and replacing it by a diagnostic message.


Tobias

PS: otherwise missing on the polymorphism side is 'private(class_var)'; 
'firstprivate(class_var)' works [all as data-sharing clauses not as 
data-mapping clauses].


PPS: The host-pointer vtable issue could be solved as for C++ in OpenMP 
5.2 by using the 'indirect' feature to lookup the device version of the 
table. (To be implemented for C++ and potentially for OpenMP 6.1+ (?) 
for Fortran.)

Re: [Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

2024-09-25 Thread Tobias Burnus


Hi

now committed the following as r15-3856-gfcff9c3dad4f35 with two 
testcase additions (and improved changelog wording).


Tobias Burnus wrote:
OpenMP mandates that when certain clauses are used with 'omp requires' 
that in all compilation units this requires clause appears.


Those clauses influence the offloading behavior (+ potentially 
codegen); hence, the must requires must match for those claues when 
device code is involved. That's the case for device functions (in 
particular 'declare target') and all OpenMP directives that take a 
'device' clause.


Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally 
more explicit. Thus, this patch adds it for 'declare target' and it 
adds it ("device" clause!) for 'interop' (but only for Fortran as 
C/C++ still does not support 'interop' directive plarsing.)


(Side note: the "device global requirement" got only added to the 
'device_safesync' clause after TR13; but we don't support that clause 
yet; it does appear in the commit log only.)


Thanks,

Tobias
commit fcff9c3dad4f356cbf56feaed7442893203a3003
Author: Tobias Burnus 
Date:   Wed Sep 25 13:57:02 2024 +0200

OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

Older versions of the OpenMP specification were not clear about what counted
as device usage. Newer (like TR13) are rather clear. Hence, this commit adds
GCC's target-used flag also when a 'declare target' or an 'interop' are
encountered.  (The latter only to Fortran as C/C++ parsing support is still
missing.) TR13 also lists 'dispatch' as target-used construct (as it has the
device clause) and 'device_safesync' as clause with global requirement
property, but both are not yet supported in GCC.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_declare_target): Set target-used bit
in omp_requires_mask.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_declare_target): Set target-used bit
in omp_requires_mask.

gcc/fortran/ChangeLog:

* parse.cc (decode_omp_directive): Set target-used bit of
omp_requires_mask when encountering the declare_target or interop
directive.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/interop-1.f90: Add dg-error for missing
omp requires requirement and declare_variant usage.
* gfortran.dg/gomp/requires-8.f90: Likewise.
---
 gcc/c/c-parser.cc | 3 +++
 gcc/cp/parser.cc  | 3 +++
 gcc/fortran/parse.cc  | 8 ++--
 gcc/testsuite/gfortran.dg/gomp/interop-1.f90  | 2 +-
 gcc/testsuite/gfortran.dg/gomp/requires-8.f90 | 4 ++--
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 6a46577f511..a681438cbbe 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (c_parser_next_token_is (parser, CPP_NAME)
   || (c_parser_next_token_is (parser, CPP_COMMA)
 	  && c_parser_peek_2nd_token (parser)->type == CPP_NAME))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 83ae38a33ab..6d3be94bf44 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -49571,6 +49571,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
   || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
 	  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME)))
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index e749bbdc6b5..9e06dbf0911 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1345,8 +1345,12 @@ decode_omp_directive (void)
 
   switch (ret)
 {
-/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET.
-   FIXME: Get clarification, cf. OpenMP Spec Issue #3240.  */
+/* For the constraints on clauses with the global requirement property,
+   we set omp_target_seen. This included all clauses that take the
+   DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device
+   (which effectively is implied by the former).  */
+case ST_OMP_DECLARE_TARGET:
+case ST_OMP_INTEROP:
 case ST_OMP_TARGET:
 case ST_OMP_TARGET_DATA:
 case ST_OMP_TARGET_ENTER_DATA:
diff --git a/gcc/tes

[Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

2024-09-24 Thread Tobias Burnus

OpenMP mandates that when certain clauses are used with 'omp requires' 
that in all compilation units this requires clause appears.


Those clauses influence the offloading behavior (+ potentially codegen); 
hence, the must requires must match for those claues when device code is 
involved. That's the case for device functions (in particular 'declare 
target') and all OpenMP directives that take a 'device' clause.


Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally more 
explicit. Thus, this patch adds it for 'declare target' and it adds it 
("device" clause!) for 'interop' (but only for Fortran as C/C++ still 
does not support 'interop' directive plarsing.)


And comment before I commit it?

Tobias

PS: In TR13, page 321, lines 14–16 — 
https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf
OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

Older versions of the OpenMP specification were not clear about what counted
as device usage. Newer (like TR13) are rather clear. Hence, this commit adds
"target used" also when 'declare target' or 'interop' are encountered.
(The latter only to Fortran as C/C++ parsing support is still missing.)
TR13 also lists 'dispatch' as construct and 'device_safesync' affected by
device use, but both are not yet supported in GCC:

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_declare_target): Set target-used bit
	in omp_requires_mask.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_declare_target): Set target-used bit
	in omp_requires_mask.

gcc/fortran/ChangeLog:

	* parse.cc (decode_omp_directive): Set target-used bit of
	omp_requires_mask when encountering the declare_target or interop
	directive.

 gcc/c/c-parser.cc| 3 +++
 gcc/cp/parser.cc | 3 +++
 gcc/fortran/parse.cc | 8 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 6a46577f511..a681438cbbe 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (c_parser_next_token_is (parser, CPP_NAME)
   || (c_parser_next_token_is (parser, CPP_COMMA)
 	  && c_parser_peek_2nd_token (parser)->type == CPP_NAME))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 35c266659e4..3b3ab0f1923 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -49524,6 +49524,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
   || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
 	  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME)))
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index e749bbdc6b5..9e06dbf0911 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1345,8 +1345,12 @@ decode_omp_directive (void)
 
   switch (ret)
 {
-/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET.
-   FIXME: Get clarification, cf. OpenMP Spec Issue #3240.  */
+/* For the constraints on clauses with the global requirement property,
+   we set omp_target_seen. This included all clauses that take the
+   DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device
+   (which effectively is implied by the former).  */
+case ST_OMP_DECLARE_TARGET:
+case ST_OMP_INTEROP:
 case ST_OMP_TARGET:
 case ST_OMP_TARGET_DATA:
 case ST_OMP_TARGET_ENTER_DATA:

Re: libgomp: with USM, init 'link' variables with host address

2024-09-24 Thread Tobias Burnus


Now committed as r15-3836-g4cb20dc043cf70

Contrary to the originally posted patch, it also acts on the newer/newly 
added 'omp requires self_maps'.


In the area of (unified-)shared memory/self maps, the next step seems to 
be to do still mapping for static variables – before moving to 
refinements like how to handle implicit 'declare target' for static 
variables, …


For this piece of code, we also want to run it for APUs even when no USM 
has been requested, avoid adding those to the mapping table (for self 
maps) and do a more efficient mapping (e.g. memcpy or avoid multiple locks).


Tobias

Tobias Burnus wrote:


short version: I think the patch as posted is fine and no action 
beyond is needed for this one issue.


See below for the long version.

Possibly modifications (now or as follow up):
- using memcpy + or let the plugin do it
- not adding link variables to the splay tree with 'USM'.

Thomas Schwinge wrote:

Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

(I yet have to set up such a USM configuration...)


You already used an USM config, e.g., when running gfx90a (likewise: 
gfx90c), except that USM on mainline it currently only works if you 
explicitly set 'export HSA_XNACK=1'.


For Nvptx, you need a post-Volta GPU with the open-kernels driver, 
which is for newer driver versions the default.


* * *

Do I understand correctly that even if
'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the
'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not
(yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'?


We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires 
unified_shared_memory'.


But, indeed, we cannot skip the memory mapping parts – due to the way 
we handle static variables.


* * *


+
+  if (is_link_var
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+    gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
+    &k->host_start, sizeof (void *), false, NULL);
  }

Calling 'gomp_copy_host2dev' looks a bit funny given we've just
determined USM (..., but I'm not asking for plain 'memcpy').


I guess a plain memcpy would do as well. [Assuming that the device's 
static variable is host accessible, which it probably is and should be.]


I add it to my to-do list for USM-related tasks to change this; 
possibly moving it to the plugin side has some advantages? Possibly 
not adding it to the splay tree if not needed. (Cf. below for env var 
discussion.)


Regarding the unload: For 'declare target link(A)', we have, e.g., 
'static int *A' on the device side. Thus, we could do 'A = NULL' – and 
rather should do 'A = {clobber}', but that's rather pointless in 
general and especially when unloading the image.



What's the advantage/rationale of doing this here vs. in
'gomp_map_vars_internal' for 'REFCOUNT_LINK'?  (May be worth a source
code comment?)


(A, B, C refers to the following example.)

We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in 
gomp_map_vars_internal.


Besides: We only want to do the initialization once and not every time 
gomp_map_vars_internal is called.


I think the following program may help to understand the issue and the 
patch better.


Note: While A, B, C are 'int …[3]' on the host, on the device we only 
have 'int B[3]' while for A it's 'int *A' and C only exists on the host.


 * * *

#pragma requires unified_shared_memory

static int A[3], B[3], C[3];
#pragma omp declare target link(A) enter(B)

#pragma omp begin declare target
void f(int *p)
{
   A[2] += B[2] + p[2];  // p points to the host's C variable
}
#pragma omp end declare target

void foo(int dev) {
  int *ptr = C;
  #pragma omp target firstprivate(ptr) device(dev)
    f (ptr);
}


* * *

Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before 
the target

region and inside the target region.

'B' points to the device local version of the variable.

And 'A' on a non-host device is likely to be NULL ('static int *A' + 
.BSS) before this patch.

Or pointing to the host's 'A' with this patch.

* * *

With A pointing to the host version (and likewise 'p' pointing to the 
host C), host fallback
and device version yield identical result for 'A' and for 'C' (via 
ptr/p). — However, 'B' on
host and non-host device have nothing in common. While that might be 
fine, in general it is not.


Hence, in order to get for a .BSS valued 'B' the same result on host 
and device, we need, e.g.


#pragma omp data map(always: B) device(dev)
  foo (dev);

to call 'foo'

Re: [Patch] OpenMP: Add support for 'self_maps' to the 'require' directive

2024-09-24 Thread Tobias Burnus


Hi all,

now committed as r15-3822-gb752eed3e3f2f2, see attachment.

I fixed on C/C++ test issue (missing 's') and added the Fortran module 
check.


Tobias

PS: I noticed that 'declare target' does not add the target-used flag. 
At least TR13 is very clear that it counts, but currently GCC does not 
regard this (with a FIXME check spec note.) This needs to be fixed 
ventually.


PPS: Old discussion:

Andre Vehreschild:

Hi Tobias,

to my eye this looks fine. I would appreciate, if you could add some tests for
errors on the fortran side, esp. where modules are involved. But no must.

Ok for mainline. Thanks for the patch.

- Andre

On Sat, 21 Sep 2024 23:37:33 +0200
Tobias Burnus  wrote:


Add support of the 'self_maps' clause in 'omp requires',
an OpenMP 6 feature but added here mostly as part of the
on-going improvement of the unified-shared memory (USM) handling.

Comments, remarks concerns before I commit it?

* * *

Regarding USM, there is on one hand the hardware:

- some hardware cannot access the host memory at all
- other hardware can access it, but either only through
an interconnect or via page migration on page fault
- on the third time of hardware, a host and device share
the same memory controller

For the latter, a 'map' never does make sense, but for
the second case, it depends on the details whether it is
better to do mapping or directly accessing the memory
(i.e. via interconnect or page migration).

On the compile-time side, the user can demand:
- no requirement
- 'requires unified_shared_memory' (= memory has to be accessible
but the implementation can still do mapping for explicit maps)
- 'requires shared_memory' - mapping is strictly not permitted.
- other hints using compiler flags

And for the runtime, the result depends on the actual hardware,
the compile-time wishes, environment variables what is done.

* * *

Currently, the runtime never maps with USM, i.e. both act the same.
At least using an environment variable, I would consider enabling
mapping - one could also consider to have it always do mappings,
except for self_maps.

On the compile side, we need to handle implicit 'declare target'
better - as it currently leads to separate memory. Using 'link',
we could point to the host memory (at least for 'self_maps').

And before we can enable USM by default for integrated/APU devices,
we need to solve some issues with 'link' (→ posted link) and for
those, 'map' has to be honored.

Those are 5.x follow up tasks, but having 'self_maps' available,
completes the what-does-the-user-want part.

Tobias

PS: There is also the 'self' modifier to the map clause, working
on a per-variable granularity. However, this like several other
6.0 items is completely out of scope of the current USM work.

PPS: See
also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and 
the patch associated set, posted
at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html
commit b752eed3e3f2f27570ea89b7c2339468698472a8
Author: Tobias Burnus 
Date:   Tue Sep 24 10:53:59 2024 +0200

OpenMP: Add support for 'self_maps' to the 'require' directive

'self_maps' implies 'unified_shared_memory', except that the latter
also permits that explicit maps copy data to device memory while
self_maps does not. In GCC, currently, both are handled identical.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_requires): Handle self_maps clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_requires): Handle self_maps clause.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS.
(gfc_namespace): Enlarge omp_requires bitfield.
* module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS.
(mio_symbol_attribute): Handle it.
* openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle
self_maps clause.
* parse.cc (gfc_parse_file): Handle self_maps clause.

gcc/ChangeLog:

* lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle
self_maps clause.
* omp-general.cc (struct omp_ts_info, omp_context_selector_matches):
Likewise for the associated trait.
* omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS.
* omp-selectors.h (enum omp_ts_code): Add
OMP_TRAIT_IMPLEMENTATION_SELF_MAPS.

include/ChangeLog:

* gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices):
Accept self_maps clause.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
L

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi all,

I have now downloaded the file at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html (by 
copying it from the browser, not the source code to avoid '>


This file had had to fix spurious line breaks like:

 @@ -5171,7 +5171,7 @@ index_interchange (gfc_code **c, int
*walk_subtrees ATTRIBUTE_UNUSED,

where the *... belongs to the previous line.

the result of this conversion is the attached file.

* * *

Harald Anlauf wrote:

Generally speaking, runtime tests should verify that they work as
expected.


There are currently only compile-time tests.

[One might argue that some should be run-time tests, albeit the really 
interesting part only happens with local/local_init (currently not 
supported) – and with true concurrency in particular with 'reduce'.]


[The interesting cases of 'local'/'local_init' there is a currently a 
'sorry' while 'reduce' only becomes truly interesting if one goes 
parallel …]


Tobias
gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_code_node): Updated to use
	c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
	Added support for dumping DO CONCURRENT locality specifiers.
	* frontend-passes.cc (index_interchange, gfc_code_walker): Updated to
	use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
	* gfortran.h (enum locality_type): Added new enum for locality types
	in DO CONCURRENT constructs.
	* match.cc (match_simple_forall, gfc_match_forall): Updated to use
	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
	(gfc_match_do): Implemented support for matching DO CONCURRENT locality
	specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
	* parse.cc (parse_do_block): Updated to use
	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
	* resolve.cc: Added struct check_default_none_data.
	(do_concur_locality_specs_f2023): New function to check compliance
	with F2023's C1133 constraint for DO CONCURRENT.
	(check_default_none_expr): New function to check DEFAULT(NONE)
	compliance.
	(resolve_locality_spec): New function to resolve locality specs.
	(gfc_count_forall_iterators): Updated to use
	code->ext.concur.forall_iterator.
	(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
	* st.cc (gfc_free_statement): Updated to free locality specifications
	and use p->ext.concur.forall_iterator.
	* trans-stmt.cc (gfc_trans_forall_1): Updated to use
	code->ext.concur.forall_iterator.

gcc/testsuite/ChangeLog:

	* gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT
	with 'concurrent' as a variable name.
	* gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO
	CONCURRENT with nested loops and REDUCE clauses.
	* gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO
	CONCURRENT with nested loops and REDUCE clauses.
	* gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with
	DEFAULT(NONE) and locality specs.
	* gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO
	CONCURRENT clauses and their interactions.
	* gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT
	functionality.
	* gfortran.dg/do_concurrent_constraints.f90: New test for constraints
	on DO CONCURRENT locality specs.
	* gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT
	clause in DO CONCURRENT.
	* gfortran.dg/do_concurrent_locality_specs.f90: New test for DO
	CONCURRENT with locality specs.
	* gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple
	REDUCE clauses in DO CONCURRENT.
	* gfortran.dg/do_concurrent_nested.f90: New test for nested DO
	CONCURRENT loops.
	* gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT
	parser error handling.
	* gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with
	MAX operation in DO CONCURRENT.
	* gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with
	sum operation in DO CONCURRENT.
	* gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in
	DO CONCURRENT.

Signed-off-by: Anuj 
---
 gcc/fortran/dump-parse-tree.cc| 113 +-
 gcc/fortran/frontend-passes.cc|   8 +-
 gcc/fortran/gfortran.h|  20 +-
 gcc/fortran/match.cc  | 286 +-
 gcc/fortran/parse.cc  |   2 +-
 gcc/fortran/resolve.cc| 354 +-
 gcc/fortran/st.cc |   5 +-
 gcc/fortran/trans-stmt.cc |   6 +-
 .../gfortran.dg/do_concurrent_10.f90  |  11 +
 .../gfortran.dg/do_concurrent_8_f2018.f90 |  19 +
 .../gfortran.dg/do_concurrent_8_f2023.f90 |  23 ++
 gcc/testsuite/gfortran.dg/do_concurrent_9.f90 |  15 +
 .../gfortran.dg/do_concurrent_all_clauses.f90 |  26 ++
 .../gfortran.dg/do_concurrent_basic.f90   |  11 +
 .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++
 .../gfortran.dg/do_concurrent_local_init.f90  |  11 +
 .../do_concurrent_locality_spec

Re: OpenMP: Fix omp_get_device_from_uid, minor cleanup

2024-09-23 Thread Tobias Burnus

Now committed as r15-3799-gcdb9aa0f623ec7 / 
https://gcc.gnu.org/r15-3799-gcdb9aa0f623ec7


Tobias

Am 21.09.24 um 01:33 schrieb Tobias Burnus:

Hi Thomas, hello all,

the attached follow-up patch does:

* It fixes an issue (thinko) related to Fortran and \0 terminated,
  which fails for at least substring strings.

* Includes some minor fixes, e.g. ensuring the device is initialized
  in omp_get_uid_from_device, the superfluous 'omp_', or adding some
  inits to oacc-host.c.

* Now the plugins return NULL instead of failing when the UID cannot
  be obtained; in that case, the fallback UID "OMP_DEV_%d" is used.

Comments or remarks before I commit it?

* * *

Regarding the topic of caching in the plugin instead of in
libgomp: If we want to change it, we either to remove the fallback
and require the existence and success of GOMP_OFFLOAD_get_uid.
Otherwise, with host fallback support, we have to cache it at both
locations, which is somehow not really sensible, either.

Thoughts on this topic?

* * *

Longer reply to Thomas' comments:

Thomas Schwinge wrote:


+  "omp_get_uid_from_device",

..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
sorted).


Ups! Should be of course without. (as 'omp_' prefix is checked before).


Do we apparently not have test suite coverage for these things?


We do *not* test all API routines. The check is, e.g., used in

  gfc_error ("%s cannot contain OpenMP API call in intervening code "

or
  "OpenMP runtime API call %qD in a region with "
  "% clause", fndecl);

And we have a few tests for each of them, but not a full set of all 
API routines.


* * *


+  const char *uid;

Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
each call, is a minor performance optimization?  (Similar to other items
cached here, I guess.)


Yes, but it goes a bit beyond: As the pointer is returned to the user, it
has to be allocated at some point - and cached to avoid allocating more
memory when called repeatable called. As the fallback and host 
handling is

also done in target.c, the caching is done here.

(Besides the API routines, two env vars and one context selector for
'target_device' support the UID.)

* * *


Please also update 'libgomp/oacc-host.c:host_dispatch'.

Done.

+  ! Note: In gfortran, strings are \0 termined
+  integer(c_int) function omp_get_device_from_uid(uid) bind(C)

For my understanding: in general, Fortran strings are *not*
NUL-terminated, right?  So this is a specific properly of 'gfortran'
and/or this GCC/OpenMP interface,


The Fortran standard leaves this to implementation, but by construction,
there is a length (however it is made handled internally, e.g. via the
declaration) and the actual data. - To aid debugging, gfortran NUL 
terminates

them.

However, when thinking a bit more about it, taking a substring of a
null-terminated string will not magically be \0 at the boundary of the
substring. - Thus, the simplified approach failed + a Fortran specific
function had to be added (→ fortran.c).

* * *


+    interface omp_get_uid_from_device
+  ! Deviation from OpenMP 6.0: VALUE added.

(..., which I suppose you've reported to OpenMP...)


No - it is not really a bug in the standard. The OpenMP
specification tries to provide a consistent API - but it
is difficult to create an API without touching the ABI.

For the caller side, the usage is the same independent
whether there is an 'intent(in)' or VALUE attribute,
a Bind(C) with or without binding name. Or also a generic
interface with multiple specific ones - which we do to handle
-fdefault-integer-8.

Obviously, the compiler needs to know those details, but
unless users codes the interface themselves instead of
using omp.h / omp_lib.h / the omp_lib module.

Thus, that's one of the few deviation from the OpenMP
specification which does affect the ABI but not the API.

* * *


+GOMP_OFFLOAD_get_uid (int ord)
+{

I guess I'd have just put this code into 'init_hsa_context', filling a
new statically-sized 'uuid' field in 'hsa_context_info' (like
'driver_version_s'; and assuming that 'hsa_context_info' is the right
abstraction for this), and then just return that 'uuid' from
'GOMP_OFFLOAD_get_uid'.


That would be one option. Still, we have to decide whether we either
want to have strictly everything handled in the device code - including
fallback handling (which could be an UID replacement or a fatal error).

Of we do part of the handling elsewhere, e.g. by permitting that the
plugin can fail or does not provide the functions, we can handle it
in target.c (as currently done) - but then we need to cache it there
as well (or at least the fallbacks).

* * *


That way, you'd avoid the unclear semantics of
who gets t

Re: [Patch] gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h

2024-09-23 Thread Tobias Burnus

Now committed as r15-3797-ga030fcad4f9f49 / 
https://gcc.gnu.org/r15-3797-ga030fcad4f9f49 as obvious.


Tobias

Am 21.09.24 um 00:52 schrieb Tobias Burnus:

See attached patch for adding the include lines:

+  if (gcn_stack_size)
+    {
+  fprintf (cfile, "#include \n");
+  fprintf (cfile, "#include \n\n");

but contrary to previously there is no 'stdint.h'
and they are also not unconditionally included.

(The 'stdbool.h' is only used for a single 'true', but on the other 
hand it
is only #included under this condition and 'stdbool.h' is a very 
simple file.)


I intent to apply this patch as obvious, unless there are further 
comments.


* * *

Thomas Schwinge wrote:


I've not verified, but I very much suspect that this change: […]

 gcn/mkoffload.cc: Use #embed for including the generated ELF file
... is responsible for: […]
 /tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 
'getenv' [-Wimplicit-function-declaration]

[…] Did you not see that happen in your testing?


I vaguely remember some fails in this area — but after digging and 
re-testing, it did not show up, for whatever reason. As it only 
triggers with -mstack-size, it somehow must have fallen through the 
cracks. :-/

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi Andre,

Andre Vehreschild wrote:

Could you also please specify the commit SHA your patch is supposed to apply
to? At current mainline's HEAD it has several rejects which makes reviewing
harder.


I just tried and here it applies cleanly on mainline, except that I get 
a bunch of:


Hunk #1 succeeded at 2904 (offset 74 lines).

style of warning, but those hunks still seem to end up at the proper play.


And please attach the patch as plain text. It is html-encoded with several
html-codes, for example a '>' is encoded as '>'. This makes it nearly
impossible to apply.


I don't see this in my email program – and also when looking at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html – I 
don't see any '>' – also not when looking at the the HTML attachment.



please check the code style of your patch using:
contrib/check_GNU_style.py 
It reports several errors with line length and formatting.


Hmm, I only see errors related to tree dump, which seem to be okay:

=== ERROR type #1: there should be exactly one space between function 
name and parenthesis (7 error(s)) ===

gcc/fortran/dump-parse-tree.cc:2915:17:   fputs (" LOCAL(", dumpfile);

And the following is in the parser – and the spaces are mandatory here:

=== ERROR type #2: there should be no space before closing parenthesis 
(1 error(s)) ===
gcc/fortran/match.cc:2758:41:   else if (gfc_match ("default ( none 
)") == MATCH_YES)


I wonder what's the difference between our email readers. – Can you try 
the version from


the mailing list archive?

Cheers,

Tobias

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi Paul,

Am 23.09.24 um 10:26 schrieb Paul Richard Thomas:

In addition to Andre's remarks, could you please tell us, when you
resubmit, if this is a complete F2023 implementation of do concurrent.
If not, what is missing?


Regarding missing parts: still to do is actually privatizing (with or
without initialization) for variables that are listed with 'local' and
'local_init'. Hence, code doing that currently fails after doing all
required diagnostic with a 'sorry not yet implemented error'. [My
feeling is that doing it in trans*.cc might make most sense, but it
could be also done by adding at Fortran AST level (inserting a BLOCK +
adding the variable there).]

Otherwise, all parsing + diagnostic should work; 'default(none)' is
diagnostics only and 'shared' doesn't do anything, except affecting
'default(none)' diagnostic. — 'reduce' will have a code gen effect, but
only when going to real concurrency/parallel execution.

* * *

If you talk about unimplemented 'do concurrent' features in general,
gfortran does not handle the forall/do-concurrent header with typespec
(i.e. 'do concurrent (integer :: i = 1, 4)', cf.
https://gcc.gnu.org/PR96255 [F2018 feature].

* * *

In terms of true parallelization:

* I was (since a while) thinking of having a
-fdo-concurrent=
compile-time flag to handle this.

* OpenMP 6.0 (added I think in Technical Report (TR) 13, which was
released Aug 1, 2024) now supports '!$omp loop' on 'do concurrent'

Either variant would then use the new locality spec (F2018/F2023 and new
in gfortran) and hook into the existing OpenMP/OpenACC handling. –
'!$omp loop' and -fdo-concurrent=omp-parallel are in any case easier
than 'omp-target-parallel' as the latter will run into issues related to
data mapping or (potentially) atomic updates now having to be in sync
with host atomic access.


BTW Thanks for doing this. It was on my long term TODO list and is now
struck off :-)


Yes – and I have heard from others that do-concurrent actually being
concurrent – or at least having having the new locality specs even if
not run concurrently is a much missed feature. — That might be from a
small bubble, but still those users wand to have it. And also Damian
mentioned that he has a project what will use it.

Also thanks from my side!

Tobias

Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)

2024-09-23 Thread Tobias Burnus


Hi all,

as a background – Anuj, did this as part of his Google Summer of Code
project (thanks!).

As I looked as various drafts, I would be happy if someone else could
have a look as well, as I probably start skipping over things and,
hence, as miss potential issues …

A bit hidden in the patch is a bug fix to allow 'concurrent' as loop
variable name of a normal 'do' loop …

Thanks,

Tobias

Anuj Mohite wrote:

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_code_node): Updated to use
c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
Added support for dumping DO CONCURRENT locality specifiers.
* frontend-passes.cc (index_interchange, gfc_code_walker): Updated to
use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
* gfortran.h (enum locality_type): Added new enum for locality types
in DO CONCURRENT constructs.
* match.cc (match_simple_forall, gfc_match_forall): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
(gfc_match_do): Implemented support for matching DO CONCURRENT locality
specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
* parse.cc (parse_do_block): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
* resolve.cc: Added struct check_default_none_data.
(do_concur_locality_specs_f2023): New function to check compliance
with F2023's C1133 constraint for DO CONCURRENT.
(check_default_none_expr): New function to check DEFAULT(NONE)
compliance.
(resolve_locality_spec): New function to resolve locality specs.
(gfc_count_forall_iterators): Updated to use
code->ext.concur.forall_iterator.
(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
* st.cc (gfc_free_statement): Updated to free locality specifications
and use p->ext.concur.forall_iterator.
* trans-stmt.cc (gfc_trans_forall_1): Updated to use
code->ext.concur.forall_iterator.

gcc/testsuite/ChangeLog:

* gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT
with 'concurrent' as a variable name.
* gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO
CONCURRENT with nested loops and REDUCE clauses.
* gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO
CONCURRENT with nested loops and REDUCE clauses.
* gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with
DEFAULT(NONE) and locality specs.
* gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO
CONCURRENT clauses and their interactions.
* gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT
functionality.
* gfortran.dg/do_concurrent_constraints.f90: New test for constraints
on DO CONCURRENT locality specs.
* gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT
clause in DO CONCURRENT.
* gfortran.dg/do_concurrent_locality_specs.f90: New test for DO
CONCURRENT with locality specs.
* gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple
REDUCE clauses in DO CONCURRENT.
* gfortran.dg/do_concurrent_nested.f90: New test for nested DO
CONCURRENT loops.
* gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT
parser error handling.
* gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with
MAX operation in DO CONCURRENT.
* gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with
sum operation in DO CONCURRENT.
* gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in
DO CONCURRENT.

Signed-off-by: Anuj 
---
  gcc/fortran/dump-parse-tree.cc| 113 +-
  gcc/fortran/frontend-passes.cc|   8 +-
  gcc/fortran/gfortran.h|  20 +-
  gcc/fortran/match.cc  | 286 +-
  gcc/fortran/parse.cc  |   2 +-
  gcc/fortran/resolve.cc| 354 +-
  gcc/fortran/st.cc |   5 +-
  gcc/fortran/trans-stmt.cc |   6 +-
  .../gfortran.dg/do_concurrent_10.f90  |  11 +
  .../gfortran.dg/do_concurrent_8_f2018.f90 |  19 +
  .../gfortran.dg/do_concurrent_8_f2023.f90 |  23 ++
  gcc/testsuite/gfortran.dg/do_concurrent_9.f90 |  15 +
  .../gfortran.dg/do_concurrent_all_clauses.f90 |  26 ++
  .../gfortran.dg/do_concurrent_basic.f90   |  11 +
  .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++
  .../gfortran.dg/do_concurrent_local_init.f90  |  11 +
  .../do_concurrent_locality_specs.f90  |  14 +
  .../do_concurrent_multiple_reduce.f90 |  17 +
  .../gfortran.dg/do_concurrent_nested.f90

Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector

2024-09-21 Thread Tobias Burnus

On Sunday, September 22, 2024, Sandra Loosemore 
wrote:
> […] I think the predicate of the more general case for
>
> target_device={device_num (NUM), kind(KIND), arch(ARCH), isa(ISA)}
>
> can be expressed (using GCC statement expression syntax) as
>
> ({
>int matches;
>#pragma omp target device (NUM)
>  matches = magic_cookie (KIND, ARCH, ISA)
>matches;
> })
>
> where magic_cookie is either a built-in or new gimple code.  I think the
gimplifier is probably the right place to do the above transformation, and
the magic_cookie expansion would happen during (or at least at the same
point in compilation as) late metadirective resolution; IOW, in the offload
compiler).  That part can call targetm.omp.device_kind_arch_isa to resolve
the whole works into a constant true/false, similar to how the "device"
selector is handled in the offload compiler, rather than into any runtime
routine.

I think that can work. I was (and am to a much lesser extent) worrying a
bit about the overhead the target call, but as the spec only has one
(default or the one specified) that should be fine.
(One can think of merging multiple target regions for multiple candidates
or moving them out of a hot loop.)

And for uid(xxx) it still needs a runtime call, but then calling
__builtin_strcmp(xxx, omp_get_uid_from_device(...)) should be fine.

There is the larger question whether we should report the compile time
supported isa or the real one, but I think either works. And whether to
regard the isa as feature set, which newer systems also support (done for
x86(_64)) or as strictly that specific version (as done for nvptx), but
that's independent of the way we implement it.

> Does this seem like a plausible way to continue?

At a glace, yes.

Tobias

[Patch] OpenMP: Add support for 'self_maps' to the 'require' directive

2024-09-21 Thread Tobias Burnus


Add support of the 'self_maps' clause in 'omp requires',
an OpenMP 6 feature but added here mostly as part of the
on-going improvement of the unified-shared memory (USM) handling.

Comments, remarks concerns before I commit it?

* * *

Regarding USM, there is on one hand the hardware:

- some hardware cannot access the host memory at all
- other hardware can access it, but either only through
  an interconnect or via page migration on page fault
- on the third time of hardware, a host and device share
  the same memory controller

For the latter, a 'map' never does make sense, but for
the second case, it depends on the details whether it is
better to do mapping or directly accessing the memory
(i.e. via interconnect or page migration).

On the compile-time side, the user can demand:
- no requirement
- 'requires unified_shared_memory' (= memory has to be accessible
  but the implementation can still do mapping for explicit maps)
- 'requires shared_memory' - mapping is strictly not permitted.
- other hints using compiler flags

And for the runtime, the result depends on the actual hardware,
the compile-time wishes, environment variables what is done.

* * *

Currently, the runtime never maps with USM, i.e. both act the same.
At least using an environment variable, I would consider enabling
mapping - one could also consider to have it always do mappings,
except for self_maps.

On the compile side, we need to handle implicit 'declare target'
better - as it currently leads to separate memory. Using 'link',
we could point to the host memory (at least for 'self_maps').

And before we can enable USM by default for integrated/APU devices,
we need to solve some issues with 'link' (→ posted link) and for
those, 'map' has to be honored.

Those are 5.x follow up tasks, but having 'self_maps' available,
completes the what-does-the-user-want part.

Tobias

PS: There is also the 'self' modifier to the map clause, working
on a per-variable granularity. However, this like several other
6.0 items is completely out of scope of the current USM work.

PPS: See also 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html
and the patch associated set, posted
at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html
OpenMP: Add support for 'self_maps' to the 'require' directive

'self_maps' implies 'unified_shared_memory', except that the latter
also permits that explicit maps copy data to device memory while
self_maps does not. In GCC, currently, both are handled identical.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_requires): Handle self_maps clause.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_requires): Handle self_maps clause.

gcc/fortran/ChangeLog:

	* gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS.
	(gfc_namespace): Enlarge omp_requires bitfield.
	* module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS.
	(mio_symbol_attribute): Handle it.
	* openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle
	self_maps clause.
	* parse.cc (gfc_parse_file): Handle self_maps clause.

gcc/ChangeLog:

	* lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle
	self_maps clause.
	* omp-general.cc (struct omp_ts_info, omp_context_selector_matches):
	Likewise for the associated trait.
	* omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS.
	* omp-selectors.h (enum omp_ts_code): Add
	OMP_TRAIT_IMPLEMENTATION_SELF_MAPS.

include/ChangeLog:

	* gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices):
	Accept self_maps clause.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Likewise.
	* libgomp.texi (TR13 Impl. Status): Set to 'Y'.
	* target.c (gomp_requires_to_name, GOMP_offload_register_ver,
	gomp_target_init): Handle self_maps clause.
	* testsuite/libgomp.fortran/self_maps.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/declare-variant-1.c: Add self_maps test.
	* c-c++-common/gomp/requires-4.c: Likewise.
	* gfortran.dg/gomp/declare-variant-3.f90:  Likewise.
	* c-c++-common/gomp/requires-2.c: Update dg-error msg.
	* gfortran.dg/gomp/requires-2.f90: Likewie.

 gcc/c/c-parser.cc  |  3 ++
 gcc/cp/parser.cc   |  3 ++
 gcc/fortran/gfortran.h | 10 +++--
 gcc/fortran/module.cc  | 11 -
 gcc/fortran/openmp.cc  | 30 -
 gcc/fortran/parse.cc   |  3 ++
 gcc/lto-cgraph.cc  |  4 ++
 gcc/omp-general.cc | 21 ++
 gcc/omp-general.h  |  1 +
 gcc/omp-selectors.h|  1 +
 .../c-c++-common/gomp/declare-variant-1.c  |  6 +++
 gcc/testsuite/c-c++-common/gomp/requires-2.c   |  2 +-
 gcc/testsuite/c-c++-common/gomp/requires-4.c   |  1 +
 .../gfort

OpenMP: Fix omp_get_device_from_uid, minor cleanup (was: Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines)

2024-09-20 Thread Tobias Burnus


Hi Thomas, hello all,

the attached follow-up patch does:

* It fixes an issue (thinko) related to Fortran and \0 terminated,
  which fails for at least substring strings.

* Includes some minor fixes, e.g. ensuring the device is initialized
  in omp_get_uid_from_device, the superfluous 'omp_', or adding some
  inits to oacc-host.c.

* Now the plugins return NULL instead of failing when the UID cannot
  be obtained; in that case, the fallback UID "OMP_DEV_%d" is used.

Comments or remarks before I commit it?

* * *

Regarding the topic of caching in the plugin instead of in
libgomp: If we want to change it, we either to remove the fallback
and require the existence and success of GOMP_OFFLOAD_get_uid.
Otherwise, with host fallback support, we have to cache it at both
locations, which is somehow not really sensible, either.

Thoughts on this topic?

* * *

Longer reply to Thomas' comments:

Thomas Schwinge wrote:


+  "omp_get_uid_from_device",

..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
sorted).


Ups! Should be of course without. (as 'omp_' prefix is checked before).


Do we apparently not have test suite coverage for these things?


We do *not* test all API routines. The check is, e.g., used in

  gfc_error ("%s cannot contain OpenMP API call in intervening code "

or
  "OpenMP runtime API call %qD in a region with "
  "% clause", fndecl);

And we have a few tests for each of them, but not a full set of all API 
routines.

* * *
   


+  const char *uid;

Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
each call, is a minor performance optimization?  (Similar to other items
cached here, I guess.)


Yes, but it goes a bit beyond: As the pointer is returned to the user, it
has to be allocated at some point - and cached to avoid allocating more
memory when called repeatable called. As the fallback and host handling is
also done in target.c, the caching is done here.

(Besides the API routines, two env vars and one context selector for
'target_device' support the UID.)

* * *


Please also update 'libgomp/oacc-host.c:host_dispatch'.

Done.

+  ! Note: In gfortran, strings are \0 termined
+  integer(c_int) function omp_get_device_from_uid(uid) bind(C)

For my understanding: in general, Fortran strings are *not*
NUL-terminated, right?  So this is a specific properly of 'gfortran'
and/or this GCC/OpenMP interface,


The Fortran standard leaves this to implementation, but by construction,
there is a length (however it is made handled internally, e.g. via the
declaration) and the actual data. - To aid debugging, gfortran NUL terminates
them.

However, when thinking a bit more about it, taking a substring of a
null-terminated string will not magically be \0 at the boundary of the
substring. - Thus, the simplified approach failed + a Fortran specific
function had to be added (→ fortran.c).

* * *


+interface omp_get_uid_from_device
+  ! Deviation from OpenMP 6.0: VALUE added.

(..., which I suppose you've reported to OpenMP...)


No - it is not really a bug in the standard. The OpenMP
specification tries to provide a consistent API - but it
is difficult to create an API without touching the ABI.

For the caller side, the usage is the same independent
whether there is an 'intent(in)' or VALUE attribute,
a Bind(C) with or without binding name. Or also a generic
interface with multiple specific ones - which we do to handle
-fdefault-integer-8.

Obviously, the compiler needs to know those details, but
unless users codes the interface themselves instead of
using omp.h / omp_lib.h / the omp_lib module.

Thus, that's one of the few deviation from the OpenMP
specification which does affect the ABI but not the API.

* * *


+GOMP_OFFLOAD_get_uid (int ord)
+{

I guess I'd have just put this code into 'init_hsa_context', filling a
new statically-sized 'uuid' field in 'hsa_context_info' (like
'driver_version_s'; and assuming that 'hsa_context_info' is the right
abstraction for this), and then just return that 'uuid' from
'GOMP_OFFLOAD_get_uid'.


That would be one option. Still, we have to decide whether we either
want to have strictly everything handled in the device code - including
fallback handling (which could be an UID replacement or a fatal error).

Of we do part of the handling elsewhere, e.g. by permitting that the
plugin can fail or does not provide the functions, we can handle it
in target.c (as currently done) - but then we need to cache it there
as well (or at least the fallbacks).

* * *


That way, you'd avoid the unclear semantics of
who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon
'GOMP_OFFLOAD_fini_device' -- currently the memory is lost?


Well, depends what you mean by lost. The 'devices' data structure in 
target.c is allocated early during device initialization and it is never 
deallocated. Hence, also the current "uint" member is never deallocated 
and remains until the end of the program acc

[Patch] gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h (was: [Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file)

2024-09-20 Thread Tobias Burnus


Hi Thomas,

See attached patch for adding the include lines:

+  if (gcn_stack_size)
+{
+  fprintf (cfile, "#include \n");
+  fprintf (cfile, "#include \n\n");

but contrary to previously there is no 'stdint.h'
and they are also not unconditionally included.

(The 'stdbool.h' is only used for a single 'true', but on the other hand it
is only #included under this condition and 'stdbool.h' is a very simple file.)

I intent to apply this patch as obvious, unless there are further comments.

* * *

Thomas Schwinge wrote:


I've not verified, but I very much suspect that this change: […]

 gcn/mkoffload.cc: Use #embed for including the generated ELF file
... is responsible for: […]

 /tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 'getenv' 
[-Wimplicit-function-declaration]
[…] Did you not see that happen in your testing?


I vaguely remember some fails in this area — but after digging and 
re-testing, it did not show up, for whatever reason. As it only triggers 
with -mstack-size, it somehow must have fallen through the cracks. :-/


Tobias
gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h

In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed
was added and no longer required fprintf '#include' removed, missing
somehow that with -mstack-size=, the generated configure_stack_size
will use 'setenv' and 'true'.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf
	lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 1f6337719e9..1a524ced653 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -613,6 +613,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *);
 
+  if (gcn_stack_size)
+{
+  fprintf (cfile, "#include \n");
+  fprintf (cfile, "#include \n\n");
+}
+
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);

[wwwdocs][Patch] gcc-15: mention wider offloading arch combination support (e.g. aarch64 + nvptx)

2024-09-20 Thread Tobias Burnus


This is supposed to document that GCC now supports offloading,
e.g., from an ARM CPU to a Nvidia GPU (i.e. Grace<->Hopper)
or, e.g., x86-64 to RISC-V. → https://gcc.gnu.org/PR96265
and https://gcc.gnu.org/PR111937 for the associated PRs.

I think it is important enough to get it into the release notes.
However, I am not sure about the wording.

Thoughts or suggestions?

Tobias
gcc-15: mention wider offloading arch support (e.g. aarch64 + nvptx) 

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7c372688..e923ede4 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -36,6 +36,14 @@ a work-in-progress.
 
 General Improvements
 
+
+  
+For offloading, issues preventing some host-device architecture
+combinations have been resolved. In particular, offloading from an
+aarch64 host to a nvptx device is now supported.
+  
+
+
 
 New Languages and Language specific improvements

[wwwdocs][Patch] gcc-15: Update OpenMP section for constr/destr on devices + UID routines

2024-09-20 Thread Tobias Burnus

A minor update for a bug fix / impl.-quality feature and a proper new 
feature.


Any comments before I apply it?

Tobias
gcc-15: Update OpenMP section for constr/destr on devices + UID routines

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7c372688..14514131 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -55,11 +55,17 @@ a work-in-progress.
   GPUs, writing to the terminal from OpenMP target regions (but not from
   OpenACC compute regions) is now also supported in Fortran; in C/C++ and
   on AMD GPUs this was already supported before with both OpenMP and OpenACC.
+  Constructors and destructors on the device side for declare target
+  static aggregates are now handled.
 
 
   OpenMP 5.1: The unroll and tile
   loop-transformation constructs are now supported.
 
+
+  OpenMP 6.0: The get_device_from_uid and
+  omp_get_uid_from_device API routines have been added.
+

[wwwdocs][Patch] gcc-15: Fortran - mention -funsigned + PowerPC Darwin IEEE module support

2024-09-20 Thread Tobias Burnus


Hi all,

I thought it makes sense to have a look at what went into GCC 15 to
update the Fortran section. However, while several bugs were fixed
(and extended some features a tiny bit) [hooray!], I did not really
see many newsworthy features.

Comments, remarks to, approval of the attached wwwdocs patch?

Tobias

PS: Anuj, the GSoC student, nearly finished his do-concurrent patch,
which will add the local/local_init/shared/default(none) of F2018
and the reduce of F2023. Still no fancy parallelization, but the first
step + useful as it will permit compiling such code and it does works
as serially run code.
gcc-15: Fortran - mention -funsigned + PowerPC Darwin IEEE module support

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7c372688..3a275d8c 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -111,6 +111,12 @@ a work-in-progress.
   Fortran 2023: The selected_logical_kind intrinsic function
   and, in the ISO_FORTRAN_ENV module, the named constants
   logical{8,16,32,64} and real16 were added.
+  Experimental support for unsigned integers; enabled by the
+  -funsigned, see https://gcc.gnu.org/onlinedocs/gfortran/Experimental-features-for-Fortran-202Y.html";
+  >gfortran documentation for details. This feature has been proposed
+  (https://j3-fortran.org/doc/year/24/24-116.txt";>J3/24-116)
+  for inclusion in the next Fortran standard.
 
 
 
@@ -214,6 +220,11 @@ a work-in-progress.
 
 
 
+PowerPC Darwin
+
+  Fortran's IEEE modules are now suppored on Darwin PowerPC.
+
+

Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-20 Thread Tobias Burnus

Now applied as r15-3730-gbf4a5efa80ef84 / 
https://gcc.gnu.org/r15-3730-gbf4a5efa80ef84


(with a few minor tailing whitespace/indentation issues fixed).

Post-commit comments are still highly welcome. By tomorrow, you will 
find the  documentation at https://gcc.gnu.org/onlinedocs/libgomp/ 
(routine + nvptx/gcn offload specific) which makes it easier to read.


Thanks,

Tobias

Tobias Burnus  wrote:

Minor update – addressing the issues that Andre raised (thanks!):

'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and 
otherwise libgomp.texi changes, only:


A bunch of typo fixes (preexisting and in the new text). I also added 
an made-up example UUID for the GPUs, which should help to reduce 
confusion.


Any additional comments or suggestions?

Tobias

Tobias Burnus wrote:
in order to know and potentially re-use a specific offload device 
(reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an 
(universal?) unique
identifier and the OpenMP device number is useful. Thus, TR13 added 
support for it.


This is a collateral patch caused by looking at the API routines for 
other reasons

and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid()'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait 
set:

if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing 
in the

not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side 
(plugin +

.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]

[Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Minor update – addressing the issues that Andre raised (thanks!):

'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise 
libgomp.texi changes, only:


A bunch of typo fixes (preexisting and in the new text). I also added an 
made-up example UUID for the GPUs, which should help to reduce confusion.


Any additional comments or suggestions?

Tobias

Tobias Burnus wrote:
in order to know and potentially re-use a specific offload device 
(reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an 
(universal?) unique
identifier and the OpenMP device number is useful. Thus, TR13 added 
support for it.


This is a collateral patch caused by looking at the API routines for 
other reasons

and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid()'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait set:
if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing in 
the

not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side 
(plugin +

.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): New functions.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc   |  4 +-
 include/cuda/cuda.h  |  7 ++
 libgomp/config/gcn/target.c  | 14 
 libgomp/config/nvptx/target.c| 14 
 libgomp/fortran.c| 15 
 libgomp/libgomp-plugin.h |  1 +
 libgomp/libgomp.h|  2 +
 libgomp/libgomp.map  |  8 +++
 libgomp/libgomp.texi | 89 ++--
 libgomp/omp.h.in |  3 +
 libgomp/omp_lib.f90.in   | 23 ++
 libgomp/omp_lib.h.in | 23 ++
 libgomp/plugin/cuda-lib.def  |  2 +
 libgomp/plugin/plugin-gcn.c  | 16 +
 libgomp/plugin/plugin-nvptx.c| 34 +
 libgomp/target.c | 56 +++
 libgomp/testsuite/libgomp.c/device_uid.c | 38 ++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 +++
 18 files changed, 384 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",

Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Hi Andre,

thanks for reading the patch + commenting.

Andre Vehreschild wrote:

in the changelog of libgomp:

* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): Add.

"Add." what? Can you be more specific, i.e. is it just a dummy or prototype?


Neither. It is a full implementation (that is a wrapper to the target.c 
function, directly called by C/C++).


The prototype used by fortran.c is 'omp.h.in' (i.e. the C/C++ header 
file, also used by user code) and for Fortran code of users, it is the 
module generated from 'omp_lib.f90.in' and the (deprecated) include file 
'omp_lib.h.in'.


The purpose of fortran.c in general – and also for the added code – is 
to be a wrapper between the Fortran API/ABI and the C ABI. In the 
current case, there are two reasons for the two functions:


(a) The result type is 'character(:), pointer' – but the C function just 
returns a '\0' terminated const char*. Hence, the wrapper function 
contains a '*result_len = strlen (*result);' besides the '*result = 
'


(b) The argument is an 'integer'. As we want to be compatible with 
-fdefault-integer-8, previously somewhat fashionable, we have an 
'int32_t' and an 'int64_t' version of the function – which needs a 
second wrapper function.


As for the other API routine, as a BIND(C) makes it call the C function, 
no wrapper it needed.


* * *

[Typo: missing 'a' – noted + will fix.]

* * *


+@item The unique identifier (UID), used with OpenMP's API UID routine, consists
+  of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+  the CUDA runtime library.  This UUID is output in grouped lower-case
+  hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+  4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
+  format used by @code{nvidia-smi}.
  @end itemize

Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then
why is the "normal" UUID display format described here? This confuses me. (Just
curiosity.)


For AMD, it is the following type of string, which contains a 8 bytes/16 
hex-digits UUID part: 'GPU-abcef0123456789'.


While for Nvidia it is 'GPU-abcdef12-1234-1234-01234567890abcd', 
consisting of a 16 bytes/32 hex-digits UUID.


For AMD, we directly get the string, matching what "rocminfo" shows as UUID.

For Nvidia, we don't get a string but a 'char bytes[16]' array filled 
with the values, which we print each as '%02x' hex digit. For the 
output, additionally, a "GPU-" prefix is added + a few hyphens. That's 
to mimic what 'nvidia-smi -a' outputs.


I admit it is slightly confusing – and when reading the .texi, it is 
also easy to miss that one part talks about AMD ("GCN") GPUs and the 
other about NVidia GPUs.


→ https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html

(In terms of OpenMP, it is only a unique identifier; it does not need to 
be universally unique [and also isn't for the host]; AMD and Nvidia call 
it UUID and it looks rather unique for the GPU; rocminfo also outputs an 
"UUID" for the CPU but that's just "CPU-XX" (twice for a dual socket 
system, i.e. not even unique), but we don't use this output.)



Er, and when I read further on, I find the nvptx implementation and that
contradicts the description. There a "normal" UUID is added to the GPU- id.


Now I am confused. What description contradicts which one?

Tobias

[Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Hi all,

in order to know and potentially re-use a specific offload device 
(reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an (universal?) 
unique
identifier and the OpenMP device number is useful. Thus, TR13 added support for 
it.

This is a collateral patch caused by looking at the API routines for other 
reasons
and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid()'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait set:
if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing in the
not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin +
.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]

Tobias
OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): Add.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc   |  4 +-
 include/cuda/cuda.h  |  7 ++
 libgomp/config/gcn/target.c  | 14 
 libgomp/config/nvptx/target.c| 14 
 libgomp/fortran.c| 15 +
 libgomp/libgomp-plugin.h |  1 +
 libgomp/libgomp.h|  2 +
 libgomp/libgomp.map  |  8 +++
 libgomp/libgomp.texi | 81 +++-
 libgomp/omp.h.in |  3 +
 libgomp/omp_lib.f90.in   | 23 +++
 libgomp/omp_lib.h.in | 23 +++
 libgomp/plugin/cuda-lib.def  |  2 +
 libgomp/plugin/plugin-gcn.c  | 16 +
 libgomp/plugin/plugin-nvptx.c| 34 ++
 libgomp/target.c | 56 
 libgomp/testsuite/libgomp.c/device_uid.c | 38 +++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 
 18 files changed, 379 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",
   "free",
+  "get_device_from_uid",
   "get_interop_int",
   "get_interop_ptr",
   "get_mapped_ptr",
@@ -3338,12 +3339,13 @@ omp_runtime_api_procname (const char *name)
 	 as DECL_NAME only omp_* and omp_*_8 appear.  */
   "display_env",
   "get_ancestor_thread_num",
-  "init_allocator",
+  "omp_get_uid_from_device",
   "get_partition_place_nums",
   "get_place_num_procs",
   "get_place_proc_ids",
   "get_schedule",
   "get_team_size",
+  "init_allocator",

Re: [PATCH v2 1/8] libgomp: Disentangle shared memory from managed

2024-09-18 Thread Tobias Burnus

Hi Andrew,

→ https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655947.html

On June 28, 2024, Andrew Stubbs wrote:

Some GPU compute systems allow the GPU to access host memory without much
prior setup, but that's not necessarily the fast way to do it. For shared
memory APUs this is almost certainly the correct choice, but for AMD there
is the difference between "fine-grained" and "coarse-grained" memory, and
for NVidia Cuda generally runs better if it knows the status of the memory
you access.

In my understanding, the migration on page fail USM implementation works
rather well in general, both with AMD and Nvidia GPUs. It obviously has
issues, e.g. when the same page is accessed frequently (semi-)concurrently
by host and the device as it then keeps migrating forth and back. Or when
you access a large array - where mapping it in one go - is faster than
keeping hitting the page boundary. And if the data is handled through an
interconnect like with NVlink on PowerPC Volta, there is a long latency.

The issue with the page migration forth and back can be solved by placing it
in a pinned memory (best in one provided by the GPU runtime). And the
page-boundary issue can be fixed by using large pages for large data.

Therefore, I think for USM, switching to no mapping by default is not a
bad idea. (However, see env var idea below.)

* * *

Not a review, but some first comments (glancing also at my local WIP patch + my
personal to-do list):

* I need to finish my patch that still does mapping with 'declare target
enter(…)'
variables - otherwise, automatically turning on GOMP_OFFLOAD_CAP_SHARED_MEM
will
give tons of fails as those systems. (Generic issue: It should also be fixed
for
'requires unified_shared_memory', but typical smaller USM code is less likely
to
thit this issue.) For 'declare target link', I have already posted a patch,
but
that has still to be committed.

* I think having a per-device property would be useful. In principle, it would
be
nice that - when two GPUs exist on a system but only one has shared-memory
support
- that USM GPU would be selected with 'requires unified_shared_memory'.
Currently,
all GPUs are then excluded. Example: Richi's gfx1030 and gfx1036, where only
gfx1036
supports USM. Currently,|HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED is false when both of his GPUs are
enabled as that's a system property and not a per-device property. (For
nvidia, we have 'CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS'; for this
feature, we had to exclude them in the count in
|GOMP_OFFLOAD_get_num_devices but also in the later GOMP_OFFLOAD_init_device they need to be skipped.)

* For AMD APUs, I wonder whether we chould use instead the following:
hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES,
&memory_properties)
plus hsa_flag_isset64 (memory_properties,
HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU)(this needs a newer 'hsa_ext_amd.h' than included in GCC or additional defs

in our copy of the .h file).
[I don't know in which ROCm version this feature got added.]

Talking about this API, I wonder whether we also want to use:
HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU (returning an uint32_t) I do see it
in our hsa_ext_amd.h but it is not used elsewhere. In our documentation, we claim, https://gcc.gnu.org/onlinedocs/libgomp/AMD-Radeon.html

"The hardware permits maximally 40 workgroups/CU" but if I run that check
on gfx90a, I get '32' and not '40' as result. (On gfx908 it is 40.)

On the other hand, 40 only shows up in the comment to
parse_target_attributes - while 32 is used in gcn_exec.

* Regarding USM vs. MAPPING:
OpenMP leaves it open whether with 'requires unified_shared_memory', explicit
mapping is honored or not. — With (OpenMP 6.0's) 'requires self_maps', no
mapping is permitted.

Your patch changes the current handling: With non-APUs, it will not set the
GOMP_OFFLOAD_CAP_SHARED_MEM for 'requires unified_shared_memory',
but that will cause that 'map' clauses are not ignored.

I think that's okay(ish) at least for explicit map clauses, but I am not sure
whether it is for implicit maps. In any case, if we do so, we probably must
update omp_target_is_accessible - otherwise we return false even when USM has
been required, if the capabilities do not include GOMP_OFFLOAD_CAP_SHARED_MEM.

In any case, similar to CRAY, I think it makes sense to have an env var to
toggle between mapping vs. not mapping for USM-supporting devices that
aren't GPUs.

* Compiler side:
- I think we need to turn all auto 'declare target enter(...)' variable
to 'link' when 'requires unified_shared_memory' is used.
(At least that's how I read the TR13/6.0 spec.)
- For 'self_map', we have to do likewise for all declare-target variables.
- I was thinking of adding a commandline flag to force-change 'enter' to
'link' - applicable to both 'usm' and no requires line. (For 'self_map'
it would be always on and for 'usm' it would still be on for automatic
'declare target

Re: libgomp: with USM, init 'link' variables with host address

2024-09-17 Thread Tobias Burnus


Hi Thomas,

short version: I think the patch as posted is fine and no action beyond 
is needed for this one issue.


See below for the long version.

Possibly modifications (now or as follow up):
- using memcpy + or let the plugin do it
- not adding link variables to the splay tree with 'USM'.

Thomas Schwinge wrote:

Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

(I yet have to set up such a USM configuration...)


You already used an USM config, e.g., when running gfx90a (likewise: 
gfx90c), except that USM on mainline it currently only works if you 
explicitly set 'export HSA_XNACK=1'.


For Nvptx, you need a post-Volta GPU with the open-kernels driver, which 
is for newer driver versions the default.


* * *

Do I understand correctly that even if
'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the
'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not
(yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'?


We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires 
unified_shared_memory'.


But, indeed, we cannot skip the memory mapping parts – due to the way we 
handle static variables.


* * *


+
+  if (is_link_var
+ && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+   gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
+   &k->host_start, sizeof (void *), false, NULL);
  }

Calling 'gomp_copy_host2dev' looks a bit funny given we've just
determined USM (..., but I'm not asking for plain 'memcpy').


I guess a plain memcpy would do as well. [Assuming that the device's 
static variable is host accessible, which it probably is and should be.]


I add it to my to-do list for USM-related tasks to change this; possibly 
moving it to the plugin side has some advantages? Possibly not adding it 
to the splay tree if not needed. (Cf. below for env var discussion.)


Regarding the unload: For 'declare target link(A)', we have, e.g., 
'static int *A' on the device side. Thus, we could do 'A = NULL' – and 
rather should do 'A = {clobber}', but that's rather pointless in general 
and especially when unloading the image.



What's the advantage/rationale of doing this here vs. in
'gomp_map_vars_internal' for 'REFCOUNT_LINK'?  (May be worth a source
code comment?)


(A, B, C refers to the following example.)

We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in 
gomp_map_vars_internal.


Besides: We only want to do the initialization once and not every time 
gomp_map_vars_internal is called.


I think the following program may help to understand the issue and the 
patch better.


Note: While A, B, C are 'int …[3]' on the host, on the device we only 
have 'int B[3]' while for A it's 'int *A' and C only exists on the host.


 * * *

#pragma requires unified_shared_memory

static int A[3], B[3], C[3];
#pragma omp declare target link(A) enter(B)

#pragma omp begin declare target
void f(int *p)
{
   A[2] += B[2] + p[2];  // p points to the host's C variable
}
#pragma omp end declare target

void foo(int dev) {
  int *ptr = C;
  #pragma omp target firstprivate(ptr) device(dev)
f (ptr);
}


* * *

Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before the 
target
region and inside the target region.

'B' points to the device local version of the variable.

And 'A' on a non-host device is likely to be NULL ('static int *A' + .BSS) 
before this patch.
Or pointing to the host's 'A' with this patch.

* * *

With A pointing to the host version (and likewise 'p' pointing to the host C), 
host fallback
and device version yield identical result for 'A' and for 'C' (via ptr/p). — 
However, 'B' on
host and non-host device have nothing in common. While that might be fine, in 
general it is not.

Hence, in order to get for a .BSS valued 'B' the same result on host and 
device, we need, e.g.

#pragma omp data map(always: B) device(dev)
  foo (dev);

to call 'foo' to ensure that the two 'B' are in sync.

* * *

Code wise, this means that with GOMP_OFFLOAD_CAP_SHARED_MEM, we still have
to apply the map for 'declare target enter(…)' variables, except if host
and device share the same code – but that should only be the case for
host fallback (= initial device) and, possibly, GOMP_OFFLOAD_CAP_NATIVE_EXEC.

* * *

NOTE: OpenMP still permits to honor explicit 'map' with 'requires 
unified_shared_memory',
only with 'self' maps, copying the data in 'map' is explicitly disallowed.

* * *

This patch + honoring 'map' for static (non-'link'?) variables even with
GOMP_OFFLOAD_CAP_SHARED_MEM where the main items for the USM follow-up patches,
I meant by "More USM cleanup/fixes/extensions to make it _more_ useful" on 
slide 16
of 
https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.openmp_openacc_and_offloading_in_gcc

Plus, to go a bit beyond:
- offering a flag to change 'declare target enter(…)' to 'link(…)'
  [RFC: enable it by default for 'requires unified_shared_memory'?]

- switching to

libgomp: with USM, init 'link' variables with host address

2024-09-14 Thread Tobias Burnus

The idea of link variables is to replace he full device variable by a 
pointer, permitting to map only parts of the variable to the device, 
saving memory.


However, having a pointer permits for (unified) shared memory to point 
to the host variable.


That's what this patch does: instead of having a dangling pointer, upon 
loading the image, the device side pointers are updated to point to the 
host. With the current patch, this is only done when explicitly 
requesting unified-shared memory.


Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

Remarks/comments/suggestions before I commit it?

Tobias

PS: I intent to do some additional changes for improved USM handling. 
Once done, I intent to look into (a) given the user a bit more power on 
mapping vs. not mapping and (b) to use for APUs by default USM, even 
without 'requires unified_shared_memory'.
libgomp: with USM, init 'link' variables with host address

If requires unified_shared_memory is set, make 'declare target link'
variables to point initially to the host pointer.

libgomp/ChangeLog:

	* target.c (gomp_load_image_to_device): For requires
	unified_shared_memory, update 'link' vars to point to the host var.
	* testsuite/libgomp.c-c++-common/target-link-3.c: New test.

 libgomp/target.c   |  5 +++
 .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 ++
 2 files changed, 57 insertions(+)

diff --git a/libgomp/target.c b/libgomp/target.c
index 47ec36928a6..66b54fd2ab8 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   array->right = NULL;
   splay_tree_insert (&devicep->mem_map, array);
   array++;
+
+  if (is_link_var
+	  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+	gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
+			&k->host_start, sizeof (void *), false, NULL);
 }
 
   /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
new file mode 100644
index 000..c707b38b7d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
@@ -0,0 +1,52 @@
+/* { dg-do run }  */
+
+#include 
+#include 
+
+#pragma omp requires unified_shared_memory
+
+int A[3] = {-3,-4,-5};
+static int q = -401;
+#pragma omp declare target link(A, q)
+
+#pragma omp begin declare target
+void
+f (uintptr_t *pA, uintptr_t *pq)
+{
+  if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42)
+__builtin_abort ();
+  A[0] = 13;
+  A[1] = 14;
+  A[2] = 15;
+  q = 23;
+  *pA = (uintptr_t) &A[0];
+  *pq = (uintptr_t) &q;
+}
+#pragma omp end declare target
+
+int
+main ()
+{
+  uintptr_t hpA = (uintptr_t) &A[0];
+  uintptr_t hpq = (uintptr_t) &q;
+  uintptr_t dpA, dpq;
+
+  A[0] = 1;
+  A[1] = 2;
+  A[2] = 3;
+  q = 42;
+
+  for (int i = 0; i <= omp_get_num_devices (); ++i)
+{
+  #pragma omp target device(device_num: i) map(dpA, dpq)
+	f (&dpA, &dpq);
+  if (hpA != dpA || hpq != dpq)
+	__builtin_abort ();
+  if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23)
+	__builtin_abort ();
+  A[0] = 1;
+  A[1] = 2;
+  A[2] = 3;
+  q = 42;
+}
+}

Re: [Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-09-13 Thread Tobias Burnus


On July 19, 2024 Tobias Burnus wrote:


Updated patch attached.


As #embed is now supported by GCC (thanks!), I could commit this patch :-)

Committed as r15-3629-g508ef585243d46 → 
https://gcc.gnu.org/r15-3629-g508ef585243d46


Unless I missed something, we need to wait for a few pending patches 
before there is a real speed up. However, first, that will come then 
automatically to GCN compilations and, secondly, the generated code is 
already much nicer thanks to #embed + seems to be a tiny tiny bit faster 
already.


Tobias
commit 508ef585243d4674d06b0737bfe8769fc18f824f
Author: Tobias Burnus 
Date:   Fri Sep 13 16:18:46 2024 +0200

gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

* config/gcn/mkoffload.cc (read_file): Remove.
(process_asm): Do not add '#include' to generated C file.
(process_obj): Generate C file that uses #embed and use
__SIZE_TYPE__ and __UINTPTR_TYPE__ instead the #include-defined
size_t and uintptr.
(main): Update call to it; remove no longer needed file I/O.

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 345bbf7709c..1f6337719e9 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -651,10 +613,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *);
 
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n\n");
-
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
 
@@ -719,35 +677,28 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
- FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+ If the file is empty, a parse error is shown as the argument to is_empty
+ is an undeclared identifier.  */
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#embed \"%s\" if_empty (error_file_is_empty)\n"
+	   "};\n\n", fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
-	   "  size_t size;\n"
+	   "  __SIZE_TYPE__ size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
-	   "  uintptr_t omp_requires_mask;\n"
+	   "  __UINTPTR_TYPE__ omp_requires_mask;\n"
 	   "  const struct gcn_image *gcn_image;\n"
 	   "  unsigned kernel_count;\n"
 	   "  const struct hsa_kernel_description *kernel_infos;\n"
@@ -1305,13 +1256,7 @@ main (int argc, char **argv)
   fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args");
   obstack_free (&ld_argv_obstack, NULL);
 
-  in = fopen (gcn_o_name, "r");
-  if (!in)
-	fatal_error (input_location, "cannot open intermediate gcn obj file");
-
-  process_obj (in, cfile, omp_requires);
-
-  fclose (in);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
   xputenv (concat ("COMPILER_PATH=", cpath, NULL));

[Patch] Fortran: Fixes to OpenMP 'interop' directive parsing support

2024-09-12 Thread Tobias Burnus

This patch fixes a couple of issues, like a missing white-space gobbling 
after matching an expression.


It also reorganizes some code to handle 'identifier_"string"' vs. 
'identifier' better as there were some diagnostic issues.


(OpenMP requires for 'fr' that the argument is either an identifier 
(that is a scalar integer parameter) or a string; while for the older 
syntax, it can be any constant integer expression.)


However, the two main changes are:

* 'fr' and 'attr' actually support a list of arguments. While I believe 
'attr("x", "y") and "attr("x"),attr("y")' are semantically identically, 
supporting more than one (or zero) values for 'fr' required a different 
encoding.


* Jakub additionally suggested that for 'fr', which supports constant 
integers and string literals, we could pass on integer values – and do 
some checking.


That's what this patch does: Known string values are converted to their 
associated integer values, others to 0. And if the integer/string value 
is unknown, a warning is printed [-Wopenmp].


Known values are those in the "OpenMP API Additional Definitions" 
document, https://www.openmp.org/specifications/ – with the addition of 
hsa / 7, which has been voted at spec level (no idea about ARB level) 
but not yet published.


Note that that's the warning is based on what is defined there, i.e. 
'level_zero' there is no warning, even though GCC does not support it. 
Obviously, if will add another value next year, GCC 15 will not support 
it and warn, even if the code is perfectly valid. — But I guess we can 
live with a warning in that case.


Comments, remarks, suggestions? — Especially regarding the internal 
representation?


Tobias

PS: Next step will be to get the C/C++ parsing working, which also 
implies encoding this representation into 'tree'. (Then doing the tree 
conversion for Fortran.) Once satisfied with that, the middle end + 
libgomp part that links those bits will come next. And the question 
whether there should be one call per 'interop' directive or might be 
multiple (e.g. one per interop object in 'init'/'use'/'destroy').
Fortran: Fixes to OpenMP 'interop' directive parsing support

Handle lists as argument to 'fr' and 'attr'; fix parsing corner cases.
Additionally, 'fr' values are now internally stored as integer, permitting
the diagnoses (warning) for values not defined in the OpenMP additional
definitions document.

	PR fortran/116661

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_omp_namelist): Rename 'init' members for clarity.
	* match.cc (gfc_free_omp_namelist): Handle renaming.
	* dump-parse-tree.cc (show_omp_namelist): Update for new format
	and features.
	* openmp.cc (gfc_match_omp_prefer_type): Parse list to 'fr' and 'attr';
	store 'fr' values as integer.
	(gfc_match_omp_init): Rename variable names.

gcc/ChangeLog:

	* omp-api.h (omp_get_fr_id_from_name, omp_get_name_from_fr_id): New
	prototypes.
	* omp-general.cc (omp_get_fr_id_from_name, omp_get_name_from_fr_id):
	New.

include/ChangeLog:

	* gomp-constants.h (GOMP_INTEROP_IFR_LAST,
	GOMP_INTEROP_IFR_SEPARATOR, GOMP_INTEROP_IFR_NONE): New.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/interop-1.f90: Extend, update dg-*.
	* gfortran.dg/gomp/interop-2.f90: Update dg-error.
	* gfortran.dg/gomp/interop-3.f90: Add dg-warning.

 gcc/fortran/dump-parse-tree.cc   |  84 +---
 gcc/fortran/gfortran.h   |   4 +-
 gcc/fortran/match.cc |  10 +-
 gcc/fortran/openmp.cc| 305 ---
 gcc/omp-api.h|   3 +
 gcc/omp-general.cc   |  29 +++
 gcc/testsuite/gfortran.dg/gomp/interop-1.f90 |  32 ++-
 gcc/testsuite/gfortran.dg/gomp/interop-2.f90 |   2 +-
 gcc/testsuite/gfortran.dg/gomp/interop-3.f90 |   2 +-
 include/gomp-constants.h |   5 +
 10 files changed, 314 insertions(+), 162 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 8fc6141611c..3547d7f8aca 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -37,6 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "constructor.h"
 #include "version.h"
 #include "parse.h"  /* For gfc_ascii_statement.  */
+#include "omp-api.h"  /* For omp_get_name_from_fr_id.  */
+#include "gomp-constants.h"  /* For GOMP_INTEROP_IFR_SEPARATOR.  */
 
 /* Keep track of indentation for symbol tree dumps.  */
 static int show_level = 0;
@@ -1537,35 +1539,69 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 	}
   else if (list_type == OMP_LIST_INIT)
 	{
-	  int i = 0;
 	  if (n->u.init.target)
 	fputs ("target,", dumpfile);
 	  if (n->u.init.targetsync)
 	fputs ("targetsync,", dumpfile);
-	  char *prefer_type = n->u.init.str;
-	  if (n->u.init.len)
-	fputs ("prefer_type(", dumpfile);
-	  if (n->u.init.len)
-	while (*prefer_type)
-	  {
-		fputc ('{', dumpfile);
-		if (n->u2.interop_int

[committed] fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661]

2024-09-11 Thread Tobias Burnus


This patch fixes an issue with unintialized variables causing random ICE.

Committed as r15-3581-g4e9265a474def9

* * *

However, follow-up work is needed as there are multiple issues:

* The check whether something is a identifier (integer parameter) and 
not just a constant expression did fail in some corner cases. → causes 
now reliably a testsuite FAIL.


* Some checks are also not quite right

* After gfc_match_expr, a gobble whitespace is missing

* I missed that 'fr(…)' and 'attr(…)' accept a list of values*

* The latter requires a different internal representation.

I have a partial fix for this, but the last two items remove some more 
work, hence, I defer this to the next patch.


Tobias

(*) It looks also as if there will be post-TR13 spec changes, but it is 
not clear whether those just change the wording or more.
commit 4e9265a474def98cb6cdb59c15fbcb7630ba330e
Author: Tobias Burnus 
Date:   Wed Sep 11 09:25:47 2024 +0200

fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661]

gcc/fortran/ChangeLog:

PR fortran/116661
* openmp.cc (gfc_match_omp_prefer_type): NULL init a gfc_expr
variable and use right locus in gfc_error.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index c04d8b0f528..1145e2ff890 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1860,6 +1860,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar
 		  }
 		fr_found = true;
 		gfc_symbol *sym = NULL;
+		e = NULL;
 		locus loc = gfc_current_locus;
 		if (gfc_match_symbol (&sym, 0) != MATCH_YES
 		|| gfc_match (" _") == MATCH_YES)
@@ -1881,7 +1882,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar
 		  {
 		gfc_error ("Expected constant integer identifier or "
 			   "non-empty default-kind character literal at %L",
-			   &e->where);
+			   &loc);
 		gfc_free_expr (e);
 		return MATCH_ERROR;
 		  }

Re: [committed] OpenMP: Add interop routines to omp_runtime_api_procname

2024-09-11 Thread Tobias Burnus


Now with attached patch …

Tobias Burnus wrote:
I realized that the attached change (committed 
asr15-3582-g6291f25631500c) was missing from what I committed in


r15-3249-g0beac1db38855e  libgomp: Add interop types and routines to 
OpenMP's headers and module


I also checked the last 5 or so commits to omp.h.in, but for those 
routines, we seemed to have remembered to update the API routine check 
for those.


Tobias
commit 6291f25631500c2d1c2328f919aa4405c3837f02
Author: Tobias Burnus 
Date:   Wed Sep 11 12:02:24 2024 +0200

OpenMP: Add interop routines to omp_runtime_api_procname

gcc/
* omp-general.cc (omp_runtime_api_procname): Add
omp_get_interop_{int,name,ptr,rc_desc,str,type_desc}
and omp_get_num_interop_properties.

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 0b61335dba4..aaa179afe13 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,7 +3260,10 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",
   "free",
+  "get_interop_int",
+  "get_interop_ptr",
   "get_mapped_ptr",
+  "get_num_interop_properties",
   "realloc",
   "target_alloc",
   "target_associate_ptr",
@@ -3289,6 +3292,10 @@ omp_runtime_api_procname (const char *name)
   "get_device_num",
   "get_dynamic",
   "get_initial_device",
+  "get_interop_name",
+  "get_interop_rc_desc",
+  "get_interop_str",
+  "get_interop_type_desc",
   "get_level",
   "get_max_active_levels",
   "get_max_task_priority",

[committed] OpenMP: Add interop routines to omp_runtime_api_procname

2024-09-11 Thread Tobias Burnus

I realized that the attached change (committed asr15-3582-g6291f25631500c) was missing from what I committed in 
r15-3249-g0beac1db38855e  libgomp: Add interop types and routines to 
OpenMP's headers and module I also checked the last 5 or so commits to 
omp.h.in, but for those routines, we seemed to have remembered to update 
the API routine check for those. Tobias

Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-10 Thread Tobias Burnus


Hi Jakub,

Jakub Jelinek wrote:

On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote:

Background: OpenMP states that for 'map(var)', all allocatable components
of 'var' will automatically also be mapped ('deep mapping').

Not a review, just a comment.  This kind of recursive mapping is also
what needs to happen for declare mapper, so wonder if that shouldn't be
solved together; and some way to merge mappings of one field after another
with the same way if consecutive fields (with possibly some padding bits
in between) are mapped the same way.


In case mapping Fortran allocatable components, I do not see the padding 
part. For 'map(var)' all of var is mapped, including all array 
descriptors. We then need to map the allocated memory (fully, if an 
array: all array elements) + do a pointer attach. And we need to handle 
unallocated components.


That's different to 'mapper', which is more flexible on one hand - but 
also really explicit. There is no hidden 'only if allocated do', 
possibly except for zero-sized array sections or iterator steps.


The Fortran part also handles polymorphic variables, where it is only 
known at runtime which components exist – which means that the whole 
tree of mappings to do is unknown at compile time. For 'mapper' that 
part is known.


[Granted, TR13 now explicitly does not permit mapping of polymorphic 
variables as there are too many corner cases. But for 6.x it is planned 
to re-add it.]


In any case, the Fortran allocatable-component mapping also needs to be 
applied to the mapper (+ iterator) generated code — and it needs to come 
last after all implicit mappings and remove-mapping optimizations. It 
could be also be done as part of the mapper expansion.


* * *

Having said this, there might be well a useful common approach that 
covers Fortran deep mapping, 'mapper' and 'iterator'.


But the current approaches don't use them. Namely, we have:

* The current Fortran deep mapper (as just posted) was ready in March 
2022, https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591075.html


* The mapper patch (latest version) is at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629363.html – 
albeit first bits date back to 
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591983.html


* There is also an 'iterator' patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html – 
albeit it lacks the 'mapper' part, which is WIP and needs for main the 
patch 'mapper' of the previous bullet.


* * *

If we have a clear plan to to implement things, I am somewhat willing to 
revise patches, if it makes sense.


But for that, a clear design is needed.

And, in any case, it would be good, if we could get all of the features 
above into GCC 15: Fortran deep mapping, 'mapper' (+ target_update with 
strides), 'iterator'  [and some other backlog].


Tobias

[Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-10 Thread Tobias Burnus


Background: OpenMP states that for 'map(var)', all allocatable components
of 'var' will automatically also be mapped ('deep mapping').

Thus, for

type(t), allocatable :: var(:)

this leads to a pseudo code like:

  map(var, storage_size(var))
  do i = lbound(var), ubound(var)
if (allocated(var(i)%comp1) &
  map(var(i)%comp1, storage_size(var(i)%comp1))
  end do

and more complicated, e.g. var(1204)%comp1(395)%str might be
an allocatable scalar. Or var is an recursive type, e.g. it has
'type(t), allocatable :: self' as component such that
  var%self%self%self%self ...
might exist (and 'self' could also be an array …).

* * *

Approach:

The idea is to handle it inlower_omp_target as follows (semi-pseudocode): /* Obtain number of 
additional mappings, in the example above, it would be size(var) * 2 for 
map + attach of 'comp1', assuming all 'var(:)%comp1' are allocated and 
no other alloc comp. exist. */ tree cnt = 
lang_hooks.decls.omp_deep_mapping_cnt (...)   if (cnt)
   deep_map_cnt *= cnt; if (cnt) → switch to pointer type + dynamically 
allocate addrs, kinds, sizes → add 'uintptr_t s[]' as tailing member to 
addr struct.

(Thus, all automatically mapped items are added to the end.)

 In the big map loop, call additionally:
lang_hooks.decls.omp_deep_mapping Additionally, in some cases, the only 
question that needs to be solved is: Does the decl have an allocatable 
component or not. In that case, lang_hooks.decls.omp_deep_mapping_p is 
sufficient. * * * RFC: Does this approach sound sensible? Does the 
attached patch (middle-end part) look reasonable? One downside of the 
current approach is that for map(var) when 'var' is present we still 
attempt to map all allocatable components instead of stopping directly 
after finding 'var' in the splay table. this can be fixed by passing 
more attributes to libgomp, but as the items come last in the list, it 
might be not straight forward. (maybe a starts-here + ends-here flags, 
where the attach next to starts-here flag could be used to do the 
lookup?). This might also lead to cases where an allocatable variable is 
mapped that otherwise would not be mapped. Albeit as 'map(var%comp)' of 
a later allocated 'comp' is only guaranteed to work with the 'always' 
modifier, having it automapped for 'map(var)' should at least not affect 
the values that were mapped. * * * The full patch has been applied to 
OG14 (= devel/omp/gcc-14) branch. The interesting bit are the hook entry 
points gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, and 
gfc_omp_deep_mapping → 
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209 
* * * I have attached the middle-end patch, only, of the patch:


https://gcc.gnu.org/g:92c3af3d4f8 Fortran/OpenMP: Support mapping of DT with 
allocatable components

to focus on that part.

Tobias

PS: In TR13 and also after TR13, a couple of mapping features were added that 
permit
shallow mapping, unmapping of allocatable components etc. I have not tried to 
analyze
whether this affects this patch, but I think it remains largely as is.
Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

gcc/ChangeLog:

	* langhooks-def.h (lhd_omp_deep_mapping_p,
	lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
	(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
	LANG_HOOKS_OMP_DEEP_MAPPING): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
	lhd_omp_deep_mapping): New stubs.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks
	* omp-expand.cc (expand_omp_target): Handle dynamic-size
	addr/sizes/kinds arrays.
	* omp-low.cc (build_sender_ref, fixup_child_record_type,
	scan_sharing_clauses, lower_omp_target): Update to handle
	new hooks and dynamic-size addr/sizes/kinds arrays.
---
 gcc/langhooks-def.h |  10 +++
 gcc/langhooks.cc|  24 ++
 gcc/langhooks.h |  15 
 gcc/omp-expand.cc   |  18 -
 gcc/omp-low.cc  | 224 ++--
 5 files changed, 265 insertions(+), 26 deletions(-)

diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index f5c67b6823c..756714558e5 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -86,6 +86,10 @@ extern enum omp_clause_defaultmap_kind lhd_omp_predetermined_mapping (tree);
 extern tree lhd_omp_assignment (tree, tree, tree);
 extern void lhd_omp_finish_clause (tree, gimple_seq *, bool);
 extern tree lhd_omp_array_size (tree, gimple_seq *);
+extern bool lhd_omp_deep_mapping_p (const gimple *, tree);
+extern tree lhd_omp_deep_mapping_cnt (const gimple *, tree, gimple_seq *);
+extern void lhd_omp_deep_mapping (const gimple *, tree, unsigned HOST_WIDE_INT,
+  tree, tree, tree, tree, tree, gimple_seq *);
 struct gimplify_omp_ctx;
 extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 	   tree);
@@ -272,6 +276,9 @@ extern tree lhd_unit_size_without_reusable_padding (tree)

Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector

2024-09-09 Thread Tobias Burnus


Hi all,

Jakub Jelinek wrote:

On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote:

This patch implements the libgomp runtime support for the dynamic
target_device selector via the GOMP_evaluate_target_device function.

[…]

Now for kind, isa and arch traits in the target_device set this patch
decides based on compiler flags used to compile some routine in libgomp.so
or libgomp.a.

While this can work in the (very unfortunate) GCN state of things where
only exact isa match is possible (I really hope we can one day generalize
it by being able to compile for a set of isas by supporting lowest
denominator and patching the EM_* in the ELF header or something similar,
perhaps with runtime decisions on what to do for different CPUs),


I think that can only work to some extend. LLVM has "gfx11-generic" 
which is compatible with gfx110{0,1,2,3,} and gfx115{0,1,2}, which at 
least helps a bit. For gfx10, it has gfx10-1-generic for gfx101{0,1,2,3} 
and gfx10-3-generic for gfx103[0-6] and gfx9-generic for gfx90{0,2,4,6,9,c}.


Thus, we could have versions which support a common subset, but we still 
need multiple libraries. And it needs to be implemented …


This sounds like a task for the GCN maintainer …

* * *


deciding what to do based on how libgomp.a or libgomp.so.1 has been compiled 
for the
rest is IMHO wrong.


I wonder whether we should do something like the following.

[The following is a mix between compile code and generated code, for 
illustrative
purpose.]

Inside the compiler do:

#ifndef ACCEL_COMPILER
intr = 0; if (targetm.omp.device_kind_arch_isa != NULL) r = 
targetm.omp.device_kind_arch_isa (omp_device_{kind,arch,isa}, val);


   if (dev_num && TREE_CODE (dev_num) == INTEGER_CST)
 {
   if (dev_num < -1 /* INVALID_DEVICE or nonconforming */)
 → 0
   if (dev_num == initial_device)
 → r
 }

 /* The '? :' condition is a compile time condition. */
 d =  ?  : omp_get_default_device ();
 if (d < -1)
   → 0
 else if (d == -1 || d == omp_get_initial_device ())
   → r
 else
   → GOMP_get_device_kind_arch_isa  (d, kind, arch, isa)

#else
   /* VARIANT 1: Assume that neither reverse offload nor nested target occurs. 
*/
   →targetm.omp.device_kind_arch_isa  (kind, arch, isa)
   /* VARIANT 2 -
   d =  ?  : omp_get_default_device ();
   if (d == omp_get_device_num ())
 →targetm.omp.device_kind_arch_isa  (kind, arch, isa)
   else
 /* Cannot really do anything here - and as no nested target is permitted,
use 'false'.  */
 → 0
#endif


* * *

And on the libgomp side GOMP_get_device_kind_arch_isa → plugin code.

And there:

(A) GCN:

kind and arch are clear. For ISA:

agent->device_isa + use existing isa_hsa_name() function (or likewise).

(B) Nvptx:

cuDeviceGetAttribute + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75 
and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76.


Example: sm_89 = (major) 8 and (minor) 9.

* * *

Does this sound sensible?

Tobias

PS: For the current host-offload GSoC task, we might eventually think of 
using cpuid on x86-64, i.e. gcc/config/i386/cpuid.h.


PS: RFC remains: Should 'sm_80' be true if the hardware/compilation is 
'sm_89' or not? Namely: Does 'sm_80' denote the capability or the 
specific hardware?


Regarding this topic, see also 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662059.html

[patch][v2] Fortran: Add OpenMP 'interop' directive parsing support

2024-09-05 Thread Tobias Burnus

Now also supports the following (note the variable name): 
'init(targetsync, target)' – and I fixed an ICE when the variable 
parsing failed.


Comments before I commit it?

Tobias

Tobias Burnus wrote:
This patch adds Fortran parsing support for OpenMP's 'interop' 
directive (which stops with a 'sorry' in trans-openmp.cc as the middle 
end support is still missing).


Tested on x86-64-gnu-linux.

Comments, suggestions, remarks?

* * *

Background:

'interop' makes it easier to call, e.g., a CUDA-BLAS function directly 
as it permits to map an OpenMP device number (→ "target" modifier 
required) to the "foreign runtime" device number or to get directly a 
stream object (→ if "targetsync" modifier specified) with dependency 
tracking.


Just calling '!$omp interop init(obj)' works but that leaves the 
decision which type of object should be returned to the run time.


Using 'prefer_type', the user can ask for a specific type. Permits is 
a string such as "hip" or an integer constant such as 
omp_ifr_cuda_driver – and the old-style syntax is 'prefer_type(integer expr|literal string> [ ,  ...])'.  [Note 
thatn a constant integer expression is permitted.]


The new syntax permits additional attributes like for 'sycl' 
requesting an 'in-order' queue (instead of the default 'out-of-order' 
queue when obtaining a stream. The new syntax is 'prefer_type( {...} 
[, {...} ... } ) where '{ ... }' is a list of either 
'attr("ompx_...")' (i.e. 'attr(...)' with literal string arg that 
starts with ompx_ and does not contain a ',') or 
'fr()' where the identifier is an integer 
constant. 'fr' can be present or not, but only once per {...} while 
multiple 'attr' may be used. [Note that as non-string only an 
identifier is permitted (i.e. a integer parameter).]


I decided for the used way to encode the string – but I am open to 
other representations as well. In my WIP/RFC patch is is used as shown 
in plugin-*.c in the patch 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html


The available foreign runtimes and values that can be returned values 
are hidden in that patch and more readable in the documentation patch 
at https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661365.html


If someone wants to delve into the details of the 'interop' feature: 
Have a look at OpenMP 5.1 (5.2) *and* TR13 and the additional 
definition document at https://www.openmp.org/specifications/ ('hsa': 
publishing pending).


* * *

Tobias

PS: In the dump, I am a bit lazy and add spurious tailing ','. As it 
is only a dump, I decided adding a bunch of checks to ensure that a 
',' only gets printed if needed is not really required. If you think 
otherwise, I can surely add a bunch of 'if' an only print it 
conditionally.


PPS: In order to to use 'interop', mainly the part in middle is 
missing, i.e. some middle-end gimplification with a call into libgomp 
– and the libgomp function. A stub version of the latter and some 
(loosely) tested plugin handling does exist as WIP/RFC patch, see 
patch link above. - Besides gimplify and the libgomp function, a bunch 
of tests and, obviously, the C and C++ FE counterpart to this patch 
have to be implemented.Fortran: Add OpenMP 'interop' directive parsing support

Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented'
after resolving.

Additionally, it moves some clause dumping away from the end directive as
that lead to 'nowait' not being printed when it should as some cases were
missed.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_INIT.
	(show_omp_clauses): Handle OMP_LIST_{INIT,USE,DESTORY}; move 'nowait'
	from end-directive to the directive dump.
	(show_omp_node, show_code_node): Handle EXEC_OMP_INTEROP.
	* gfortran.h (enum gfc_statement): Add ST_OMP_INTEROP.
	(OMP_LIST_INIT, OMP_LIST_USE, OMP_LIST_DESTROY): Add.
	(enum gfc_exec_op): Add EXEC_OMP_INTEROP.
	(struct gfc_omp_namelist): Add interop items to union.
	(gfc_free_omp_namelist): Add boolean arg.
	* match.cc (gfc_free_omp_namelist): Update to free
	interop union members.
	* match.h (gfc_match_omp_interop): New.
	* openmp.cc (gfc_omp_directives): Uncomment 'interop' entry.
	(gfc_free_omp_clauses, gfc_match_omp_allocate,
	gfc_match_omp_flush, gfc_match_omp_clause_reduction): Update
	call.
	(enum omp_mask2): Add OMP_CLAUSE_{INIT,USE,DESTROY}.
	(OMP_INTEROP_CLAUSES): Use it.
	(gfc_match_omp_clauses): Match those clauses.
	(gfc_match_omp_prefer_type, gfc_match_omp_init,
	gfc_match_omp_interop): New.
	(resolve_omp_clauses): Handle interop clauses.
	(omp_code_to_statement): Add ST_OMP_INTEROP.
	(gfc_resolve_omp_d

[patch] config/nvptx: Handle downward compat for OpenMP context selector

2024-09-02 Thread Tobias Burnus

For x86-64, the context selector matching is are currently based on 
features. That's obvious for 'SSE2' where any system offering SSE2 
matches, but that also the case for, e.g. a selector asking for 'i486' – 
which matches when compiling for 'i486', 'i586' and 'i686'.


That has pro and cons. Assume compiling for 'i686': If there is a 
context selector asking for ISA 'i486' we want to use it as i686 
supports it – and not, e.g., the generic fallback. — On the other hand, 
if there are two variants, one for 'i686' and one for 'i486', we want to 
use the 'i686' variant if the hardware supports it. [I am not sure how 
to handle this best.]


* * *

The attached patch does now likewise for nvptx, where the compute 
capabilities are downward compatible with one exception → 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-module-directives-target


"In general, generations of SM architectures follow an onion layer 
model, where each generation adds new features and retains all features 
of previous generations. The onion layer model allows the PTX code 
generated for a given target to be run on later generation devices.


Target architectures with suffix “a”, such as sm_90a, include 
architecture-accelerated features that are supported on the specified 
architecture only, hence such targets do not follow the onion layer 
model. Therefore, PTX code generated for such targets cannot be run on 
later generation devices. Architecture-accelerated features can only be 
used with targets that support these features."


* * *

The patch additionally updates the documentation.

Comments, suggestions, approval, disapproval?

Tobias

PS: I wonder whether it wouldn't make sense to permit all sm_ values 
with -march=, even if some produce the same binaries (at least for now) 
vs. supporting only some with -march= and using -march-map= to handle 
all values. But that's independent of this RFC patch.
config/nvptx: Handle downward compat for OpenMP context selector

Nvptx's compute capabilities (SM_XX) are downward compatible, i.e. SM_80
supports all features of SM_30, SM_70 etc.  Additionally, GCC's -march=
currently only supports those values that actually change the generated
code - and offers -march=... to map higher values to the next lower
supported version.

Update libgomp.texi to document the downward compatibility and case
sensitivity of the context selectors.

gcc/ChangeLog:

	* config/nvptx/nvptx-sm.def (NVPTX_SM_COMPAT): Add compute
	capabilities supported by -march-map= lower than sm_80 (= highest
	supported -march=).
* config/nvptx/gen-omp-device-properties.sh: Hande it.
* config/nvptx/gen-h.sh: Ignore it.
* config/nvptx/gen-multilib-matches.sh: Likewise.
* config/nvptx/gen-opt.sh: Likewise.
	* config/nvptx/nvptx.cc (sm_version_to_number): New.
	(nvptx_omp_device_kind_arch_isa): Match when requested ISA (sm_XX)
	version is lower than actual ISA version.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add note about case
	sensitivity and downward compatibility.
* testsuite/libgomp.c/declare-variant-3.h: Extend to check for
	downward compatibility.
* testsuite/libgomp.c/declare-variant-3-sm30.c: Update.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.
* testsuite/libgomp.c/declare-variant-3.c: Likewise.

 gcc/config/nvptx/gen-h.sh  |  2 +-
 gcc/config/nvptx/gen-multilib-matches.sh   |  2 +-
 gcc/config/nvptx/gen-omp-device-properties.sh  |  2 +-
 gcc/config/nvptx/gen-opt.sh|  2 +-
 gcc/config/nvptx/nvptx-sm.def  | 22 +++
 gcc/config/nvptx/nvptx.cc  | 33 --
 .../testsuite/libgomp.c/declare-variant-3-sm30.c   |  3 +-
 .../testsuite/libgomp.c/declare-variant-3-sm35.c   |  3 +-
 .../testsuite/libgomp.c/declare-variant-3-sm53.c   |  3 +-
 .../testsuite/libgomp.c/declare-variant-3-sm70.c   |  3 +-
 .../testsuite/libgomp.c/declare-variant-3-sm75.c   |  3 +-
 .../testsuite/libgomp.c/declare-variant-3-sm80.c   |  1 +
 libgomp/testsuite/libgomp.c/declare-variant-3.c|  8 ++-
 libgomp/testsuite/libgomp.c/declare-variant-3.h| 75 --
 14 files changed, 140 insertions(+), 22 deletions(-)

diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh
index ea75e127cde..592dd8bebc8 100644
--- a/gcc/config/nvptx/gen-h.sh
+++ b/gcc/config/nvptx/gen-h.sh
@@ -21,7 +21,7 @@
 nvptx_sm_def="$1/nvptx-sm.def"
 gen_copyright_sh="$1/gen-copyright.sh"
 
-sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
+sms=$(grep '^NVPTX_SM[^_]' $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
 cat <= v)
+__builtin_abort ();
+
   __built

[patch][v2] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

2024-09-02 Thread Tobias Burnus


Hi Richard,

Am 02.09.24 um 13:58 schrieb Richard Biener:

Hmm, I can't really follow how and where it's currently decided whether to
output offload tables for the LTRANS units


Before the patch, output_offload_tables is called unconditionally, but 
guarded by the check whether there is anything to output at all. Call trees:


When outputting the .o files, the call is done via ipa_passes → 
ipa_write_summaries → ipa_write_summaries_1.


This calls ipa_write_summaries twice: once for the offload/for-device 
LTO section and once for the host LTO section – and both calls are needed.


For the LTO (lto1, ltrans) step, the call tree starts with: 
do_whole_program_analysis → lto_wpa_write_files → stream_out_partitions 
→ stream_out_partitions_1 → stream_out → ipa_write_optimization_summaries.


Here, stream_out_partitions potentially forks the 
'stream_out_partitions_1' calls. And each stream_out_partitions_1 calls 
for each (of its share) of the partitions 'stream_out' in a loop.


With either code path, the ipa_write... function then calls: write_lto → 
lto_output → output_offload_tables.



but instead of an odd global
variable would it be possible to pass that down as a flag or,
alternatively encode that flag in the representation for the LTRANS
partition?  I suppose that's the out_decl_state?


Actually, I tried follow your initial suggestion of the PR, but now 
moved to the somewhat clearer out_decl_state.


Tobias
LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), output_offload_tables
wrote the output tables once per fork.

	PR lto/116535

gcc/ChangeLog:

	* lto-cgraph.cc (output_offload_tables): Remove offload_ frees.
	* lto-streamer-out.cc (lto_output): Make call to it depend on
	lto_get_out_decl_state ()->output_offload_tables_p.
	* lto-streamer.h (struct lto_out_decl_state): Add
	output_offload_tables_p field.
	* tree-pass.h (ipa_write_optimization_summaries): Add bool argument.
	* passes.cc (ipa_write_summaries_1): Add bool
	output_offload_tables_p arg.
	(ipa_write_summaries): Update call.
	(ipa_write_optimization_summaries): Accept output_offload_tables_p.

gcc/lto/ChangeLog:

	* lto.cc (stream_out): Update call to
	ipa_write_optimization_summaries to pass true for first partition.

 gcc/lto-cgraph.cc   | 10 --
 gcc/lto-streamer-out.cc |  3 ++-
 gcc/lto-streamer.h  |  3 +++
 gcc/lto/lto.cc  |  2 +-
 gcc/passes.cc   | 11 ---
 gcc/tree-pass.h |  3 ++-
 6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6395033ab9d..1492409427c 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -1139,16 +1139,6 @@ output_offload_tables (void)
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
   lto_destroy_simple_output_block (ob);
-
-  /* In WHOPR mode during the WPA stage the joint offload tables need to be
- streamed to one partition only.  That's why we free offload_funcs and
- offload_vars after the first call of output_offload_tables.  */
-  if (flag_wpa)
-{
-  vec_free (offload_funcs);
-  vec_free (offload_vars);
-  vec_free (offload_ind_funcs);
-}
 }
 
 /* Verify the partitioning of NODE.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 523d6dad221..a4b171358d4 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2829,7 +2829,8 @@ lto_output (void)
  statements using the statement UIDs.  */
   output_symtab ();
 
-  output_offload_tables ();
+  if (lto_get_out_decl_state ()->output_offload_tables_p)
+output_offload_tables ();
 
   if (flag_checking)
 {
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 79c44d2cae7..4da1a3efe03 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -531,6 +531,9 @@ struct lto_out_decl_state
 
   /* True if decl state is compressed.  */
   bool compressed;
+
+  /* True if offload tables should be output. */
+  bool output_offload_tables_p;
 };
 
 typedef struct lto_out_decl_state *lto_out_decl_state_ptr;
diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 52dd436fd9a..1ee215d8f1d 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -178,7 +178,7 @@ stream_out (char *temp_filename, lto_symtab_encoder_t encoder, int part)
 
   gcc_assert (!dump_file);
   streamer_dump_file = dump_begin (TDI_lto_stream_out, NULL, part);
-  ipa_write_optimization_summaries (encoder);
+  ipa_write_optimization_summaries (encoder, part == 0);
 
   free (CONST_CAST (char *, file->filename));
 
diff --git a/gcc/passes.cc b/gcc/passes.cc
index d73f8ba97b6..057850f4dec 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -2829,11 +2829,13 @@ ipa_write_summaries_2 (opt_pass *pass, struct lto_out_decl_state *state)
summaries.  SET is the set of nodes to be written.  */
 
 static void
-ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
+ipa_writ

[patch] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

2024-09-02 Thread Tobias Burnus


The attached patch tries to fix the issue exposed by the PR:

The main ingredient is partitioning of the LTO work, e.g. by using 
-flto-partition=max.


With -flto=2 (or higher or when a jobserver has been detected), not only 
the LTO part is run in parallel but also the creation of the ltrans 
files itself, i.e. gcc/lto/lto.cc's stream_out_partitions forks multiple 
processes to write those files concurrently (here: -flto=2 means two 
processes, each writing about half of the partitions).


For each partition, output_offload_tables is called – which in principle 
would add the offload tables to each file. To prevent this, in flag_wpa 
mode, the tables were freed. That solves the WPA problem, but only if 
all partitions are written by a single process (e.g. -flto=1). If not, 
the data is duplicated and only the data belonging to the fork is modified.


This patch moves the logic to gcc/lto/lto.cc and sets a global variable 
to ensure that it is only output for the first partition, independently 
whether there is only one or several processes writing the ltrans file, 
trying to follow what Richard proposed in the PR?


The patch has been tested on x86-64-gnu-linux with nvptx offloading, but 
I should do a full bootstrap+regtest next.


Comments, suggestions, remarks, approval?

Tobias
LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), output_offload_tables
wrote the output tables once per fork.

	PR lto/116535

gcc/ChangeLog:

	* omp-offload.h (offload_output_tables_p): New extern bool var.
	* omp-offload.cc (offload_output_tables_p): Define it with value true.
	* lto-cgraph.cc (output_offload_tables): Only output tables when
	offload_output_tables_p is true.

gcc/lto/ChangeLog:

	* lto.cc (stream_out_partitions_1): Set offload_output_tables_p to false
	except for the first partition.

 gcc/lto-cgraph.cc  | 16 
 gcc/lto/lto.cc |  3 +++
 gcc/omp-offload.cc |  2 ++
 gcc/omp-offload.h  |  1 +
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6395033ab9d..19ac252e1b4 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -1081,8 +1081,10 @@ output_offload_tables (void)
 {
   bool output_requires = (flag_openmp
 			  && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0);
-  if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars)
-  && !output_requires)
+  if (!offload_output_tables_p
+  || (vec_safe_is_empty (offload_funcs)
+	  && vec_safe_is_empty (offload_vars)
+	  && !output_requires))
 return;
 
   struct lto_simple_output_block *ob
@@ -1139,16 +1141,6 @@ output_offload_tables (void)
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
   lto_destroy_simple_output_block (ob);
-
-  /* In WHOPR mode during the WPA stage the joint offload tables need to be
- streamed to one partition only.  That's why we free offload_funcs and
- offload_vars after the first call of output_offload_tables.  */
-  if (flag_wpa)
-{
-  vec_free (offload_funcs);
-  vec_free (offload_vars);
-  vec_free (offload_ind_funcs);
-}
 }
 
 /* Verify the partitioning of NODE.  */
diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 52dd436fd9a..69c7527d399 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "lto-common.h"
 #include "opts-jobserver.h"
+#include "omp-offload.h"
 
 /* Number of parallel tasks to run.  */
 static int lto_parallelism;
@@ -226,12 +227,14 @@ wait_for_child ()
 static void
 stream_out_partitions_1 (char *temp_filename, int blen, int min, int max)
 {
+   offload_output_tables_p = (min == 0);
/* Write all the nodes in SET.  */
for (int p = min; p < max; p ++)
  {
sprintf (temp_filename + blen, "%u.o", p);
stream_out (temp_filename, ltrans_partitions[p]->encoder, p);
ltrans_partitions[p]->encoder = NULL;
+   offload_output_tables_p = false;
  }
 }
 
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 934fbd80bdd..76bfda94217 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -88,6 +88,8 @@ struct oacc_loop
 /* Holds offload tables with decls.  */
 vec *offload_funcs, *offload_vars, *offload_ind_funcs;
 
+bool offload_output_tables_p = true;
+
 /* Return level at which oacc routine may spawn a partitioned loop, or
-1 if it is not a routine (i.e. is an offload fn).  */
 
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index d972bb7eafd..2d1d173016c 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -29,6 +29,7 @@ extern int oacc_fn_attrib_level (tree attr);
 extern GTY(()) vec *offload_funcs;
 extern GTY(()) vec *offload_vars;
 extern GTY(()) vec *offload_ind_funcs;
+extern bool offload_output_tables_p;
 
 extern void omp_finish_file (void);
 extern void omp_di

[patch] lto/lto.cc: Fix build with not HAVE_WORKING_FORK

2024-08-30 Thread Tobias Burnus


With HAVE_WORKING_FORK unset, I get an unused by set compile error.

That's fixed with the attached patch.

Tobias

PS: And if someone wonders what I am doing, see https://gcc.gnu.org/PR116535
lto/lto.cc: Fix build with not HAVE_WORKING_FORK

gcc/lto/ChangeLog:

	* lto.cc: Add missing HAVE_WORKING_FORK.

diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 58ff0c45f57..66d9f136ae1 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -62,8 +62,10 @@ along with GCC; see the file COPYING3.  If not see
 /* Number of parallel tasks to run.  */
 static int lto_parallelism;
 
+#ifdef HAVE_WORKING_FORK
 /* Number of active WPA streaming processes.  */
 static int nruns = 0;
+#endif
 
 /* GNU make's jobserver info.  */
 static jobserver_info *jinfo = NULL;

[patch] lto-wrapper: Honor -save-temps for ltrans' makefile

2024-08-30 Thread Tobias Burnus

Noticed that -save-tmp is ignored for parallel LTO. With this patch, the 
result is now:


make -f ./a.ltrans.mk -j2 all
[Leaving LTRANS ./a.ltrans.mk]

instead of

make -f /tmp/ccXgtcjJ.mk -j2 all
[Leaving LTRANS /tmp/ccXgtcjJ.mk]

OK for mainline?

Tobias
lto-wrapper: Honor -save-temps for ltrans' makefile

gcc/ChangeLog:

	* lto-wrapper.cc (run_gcc): Honor -save-temps for
	makefile name.

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 6bfc96590a5..c07765b37a2 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1994,7 +1994,10 @@ cont:
 
   if (parallel)
 	{
-	  makefile = make_temp_file (".mk");
+	  if (save_temps)
+	makefile = concat (dumppfx, "ltrans.mk", NULL);
+	  else
+	makefile = make_temp_file (".mk");
 	  mstream = fopen (makefile, "w");
 	  qsort (ltrans_priorities, nr, sizeof (int) * 2, cmp_priority);
 	}

[patch] Fortran: Add OpenMP 'interop' directive parsing support

2024-08-29 Thread Tobias Burnus

This patch adds Fortran parsing support for OpenMP's 'interop' directive 
(which stops with a 'sorry' in trans-openmp.cc as the middle end support 
is still missing).


Tested on x86-64-gnu-linux.

Comments, suggestions, remarks?

* * *

Background:

'interop' makes it easier to call, e.g., a CUDA-BLAS function directly 
as it permits to map an OpenMP device number (→ "target" modifier 
required) to the "foreign runtime" device number or to get directly a 
stream object (→ if "targetsync" modifier specified) with dependency 
tracking.


Just calling '!$omp interop init(obj)' works but that leaves the 
decision which type of object should be returned to the run time.


Using 'prefer_type', the user can ask for a specific type. Permits is a 
string such as "hip" or an integer constant such as omp_ifr_cuda_driver 
– and the old-style syntax is 'prefer_type(string> [ ,  ...])'.  [Note thatn a constant integer 
expression is permitted.]


The new syntax permits additional attributes like for 'sycl' requesting 
an 'in-order' queue (instead of the default 'out-of-order' queue when 
obtaining a stream. The new syntax is 'prefer_type( {...} [, {...} ... } 
) where '{ ... }' is a list of either 'attr("ompx_...")' (i.e. 
'attr(...)' with literal string arg that starts with ompx_ and does not 
contain a ',') or 'fr()' where the identifier 
is an integer constant. 'fr' can be present or not, but only once per 
{...} while multiple 'attr' may be used. [Note that as non-string only 
an identifier is permitted (i.e. a integer parameter).]


I decided for the used way to encode the string – but I am open to other 
representations as well. In my WIP/RFC patch is is used as shown in 
plugin-*.c in the patch 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html


The available foreign runtimes and values that can be returned values 
are hidden in that patch and more readable in the documentation patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661365.html


If someone wants to delve into the details of the 'interop' feature: 
Have a look at OpenMP 5.1 (5.2) *and* TR13 and the additional definition 
document at https://www.openmp.org/specifications/ ('hsa': publishing 
pending).


* * *

Tobias

PS: In the dump, I am a bit lazy and add spurious tailing ','. As it is 
only a dump, I decided adding a bunch of checks to ensure that a ',' 
only gets printed if needed is not really required. If you think 
otherwise, I can surely add a bunch of 'if' an only print it conditionally.


PPS: In order to to use 'interop', mainly the part in middle is missing, 
i.e. some middle-end gimplification with a call into libgomp – and the 
libgomp function. A stub version of the latter and some (loosely) tested 
plugin handling does exist as WIP/RFC patch, see patch link above. - 
Besides gimplify and the libgomp function, a bunch of tests and, 
obviously, the C and C++ FE counterpart to this patch have to be 
implemented.
Fortran: Add OpenMP 'interop' directive parsing support

Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented'
after resolving.

Additionally, it moves some clause dumping away from the end directive as
that lead to 'nowait' not being printed when it should as some cases were
missed.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_INIT.
	(show_omp_clauses): Handle OMP_LIST_{INIT,USE,DESTORY}; move 'nowait'
	from end-directive to the directive dump.
	(show_omp_node, show_code_node): Handle EXEC_OMP_INTEROP.
	* gfortran.h (enum gfc_statement): Add ST_OMP_INTEROP.
	(OMP_LIST_INIT, OMP_LIST_USE, OMP_LIST_DESTROY): Add.
	(enum gfc_exec_op): Add EXEC_OMP_INTEROP.
	(struct gfc_omp_namelist): Add interop items to union.
	(gfc_free_omp_namelist): Add boolean arg.
	* match.cc (gfc_free_omp_namelist): Update to free
	interop union members.
	* match.h (gfc_match_omp_interop): New.
	* openmp.cc (gfc_omp_directives): Uncomment 'interop' entry.
	(gfc_free_omp_clauses, gfc_match_omp_allocate,
	gfc_match_omp_flush, gfc_match_omp_clause_reduction): Update
	call.
	(enum omp_mask2): Add OMP_CLAUSE_{INIT,USE,DESTROY}.
	(OMP_INTEROP_CLAUSES): Use it.
	(gfc_match_omp_clauses): Match those clauses.
	(gfc_match_omp_prefer_type, gfc_match_omp_init,
	gfc_match_omp_interop): New.
	(resolve_omp_clauses): Handle interop clauses.
	(omp_code_to_statement): Add ST_OMP_INTEROP.
	(gfc_resolve_omp_directive): Add EXEC_OMP_INTEROP.
	* parse.cc (decode_omp_directive): Parse 'interop' directive.
	(next_statement, gfc_ascii_statement): Handle ST_OMP_INTEROP.
	* st.cc (gfc_free_statement): Likewise
	* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_INTEROP.
	* trans.cc (trans_code): Likewise.
	* trans-openmp.cc (gfc_trans_omp_directive): Print 'sorry'
	for EXEC_OMP_INTEROP.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/interop-1.f90: New test.
	* gfortran.dg/gomp/interop-2.f90: New test.
	* gfortran.dg/gomp/interop-3.f90: New test.

 gcc/fortran/dump-parse-tree.cc   |  61 +

Re: [patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-08-28 Thread Tobias Burnus


Hi Sandra,

thanks for your comments.

Sandra Loosemore wrote:
Stepping back to consider this from a higher-level perspective, 
shouldn't the interface documented in the GCC manual reflect what GCC 
implements, rather than what the spec says that is explicitly *not* 
what is implemented?  Or is the way you have documented this 
consistent with the way other libgomp features that don't strictly 
conform to the spec have already been documented?


The idea of the implementation is to be 100% compatible to the OpenMP 
specification in terms of usage – but to deviate in terms of the implied 
ABI.


The issue is really that the specification is more explicit than it 
should be - but it is clear why it is such:


It is much easier and more readable to write: 'subroutine f(x); integer 
:: x' — instead of stating that "subroutine f" exists and takes "x" as 
argument which accepts default-kind integers.


But the first version automatically implies that it is not "subroutine 
f(x) BIND(C)" and not "integer, VALUE :: x".


However, I want to use bind(C) and value in GCC. For a user that 
includes the omp_lib module (or omp_lib.h header), the difference is not 
visible.


* * *

I personally dislike it a lot if vendor documentation of a specific 
standard function deviates in declaration, semantic or accepted 
arguments without telling me that it is modified.


That's the reason I mentioned it in the previously attached patch. 
However, as it only affects the ABI – and does not affect users (unless 
they really care about the internal decl), maybe just not mentioning the 
differences is better?


[RFC] Thus, the question is whether it should be stated in the manual or 
not? (Removed completely or kept commented out?)


AsI wrote in the original email: "PS: I am not 100% sure whether adding 
the implementation detail makes sense or not."


* * *

In the attached patch, I commented it out in the .texi but left it there.

* * *
+the name matches the name of the named constant with the 
@code{omp_ipr_}

+prefix removed.


That should be @samp{omp_ipr_}, not @code markup.


Hmm, I thought that non-white-space strings and in particular 
[A-Za-z0-9_]+ would be permissible for @code and only when going beyond 
one would need something else.


+@samp{N/A} if this property is not available for the given foreign 
runtime.


@code{"N/A"}, I think.  (It's a string literal, right?)


Well, the result of the function call is a pointer to the string N, /, 
A, \0 – and not to ", N, /, A, ", \0. And while the code indeed uses 
"N/A" it could also do res[0] = 'N'; res[1] = '/', …


Thus, I think @code{"N/A"} (with quotation marks) is slightly 
misleading. — I am happy to use @code{N/A} instead of @samp{N/A}, if 
that seems to be more appropriate, but I am not so happy about @code{"N/A"}.


* * *

I know the libgomp manual uses different formatting conventions than 
the GCC manual or other Texinfo manuals.  Have you inspected the 
formatted output to make sure it's what you expect and consistent with 
the rest of the document?


It looked okay when glancing over the result in info, PDF and HTML 
format, right now and also when when I posted the previous patch.


New is that I don't explicitly line break lines in the interface. Doing 
so lead before to odd very short lines in the HTML version and possible 
double breaks in the 'info' file if the 'info' line was a bit shorter 
than what was anticipated in the .texi file. As the result of the 
automatic line breaks looks reasonable, I used it as such.


Remarks:

* '@code{  abc}' leads to an indentation in some of the output formats 
but not in all; thus, I have not used it. Some but not all existing code 
uses it. — Using '@ @ @code' would work, but is ugly and not really 
needed, either.


* I think we could consider updating the style eventually to be 
consistent with GCC's style (and move there slowly and step wise).


* Regarding 'abc -- def' vs. 'abc---def', that's a Europeanism. To quote 
the "Oxford Guide to Style": "OUP [Oxford University Press] and most US 
publishers use the unspaced (non-touching) em rule as a parenthetical 
dash; other British publishers use the en rule with space either side."


Tobias

PS: I have partially updated the patch + attached it, but it is not yet 
fully updated; also because we have not yet settled on the items above.
libgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

	* libgomp.texi (Interoperability Routines): Add.
	(omp_target_memcpy_async, omp_target_memcpy_rect_async):
	Document that depobj_list may be omitted in C++ and Fortran.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index fe25d879788..5605d522216 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -656,7 +656,7 @@ specification in version 5.2.
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
-@c * Interoperability Routines::
+* Interoperability Routines::
 * Memory Management Routines::
 @c * Tool Control Routine::
 * Envi

Re: [PATCH] Libquadmath: update doc for some constants

2024-08-27 Thread Tobias Burnus


Hi FX,

FX Coudert wrote:

Give it’s a doc patch, I think it might fall under the obvious rule, and will 
commit in a week if there is no objection.


The patch clearly fixes a bug in the current specification and is fine, 
I just wonder …



* libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_2_PIq): Fix
description of these constants.



diff --git a/libquadmath/libquadmath.texi b/libquadmath/libquadmath.texi
index dc2a9ff374b..ce4accf6421 100644
--- a/libquadmath/libquadmath.texi
+++ b/libquadmath/libquadmath.texi

…

  @item @code{M_PI_2q}: pi divided by two
  @item @code{M_PI_4q}: pi divided by four
  @item @code{M_1_PIq}: one over pi
-@item @code{M_2_PIq}: one over two pi
+@item @code{M_2_PIq}: two over pi
  @item @code{M_2_SQRTPIq}: two over square root of pi
  @item @code{M_SQRT2q}: square root of 2
  @item @code{M_SQRT1_2q}: one over square root of 2


... whether we should change the "over" which somehow sounds odd. "two 
divided by pi" sounds better to me than "two over pi".


I do note, however, that the following documentation uses a slightly 
different wording:


"M_2_PI -Two times the reciprocal of pi."

https://www.gnu.org/software/libc/manual/html_node/Mathematical-Constants.html

Hence, while I am fine with the change, I think we should replace the 
"over" wording (multiple times) and move either to "divided by" or 
[(…times) the reciprocal of".


Tobias

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Tobias Burnus


Hi Andrew,

Andrew Stubbs:

On 22/08/2024 19:26, Tobias Burnus wrote:
(A) Any comments, suggestions regarding the patch in general and in 
particular the plugin/ related parts?


The code all looks pretty reasonable to me.

The header file conditional includes worry me though: it is adding 
complexity in a way that hurts maintainability, and looks like it 
might break somebody's hypothetical out-of-tree plugin. Is it not 
better for a plugin that supports interop to include omp.h itself?
I do note that libgomp.h explicitly includes 'omp.h.in' – and later 
includes 'libgomp-plugin.h' and not omp.h.


But I don't know why. It could be some build-related issue or because it 
replaces already the locking definition by its own? (Albeit it could 
still use 'omp.h' together with the current '#ifdef' protection.)


Assuming that omp.h.in is only included as the locking-type dance is 
done – and not an actual build issues: I will try whether just including 
'omp.h' in plugin/plugin-*.c and libgomp-plugin.c before 
libgomp-plugin.h works. For libgomp.h, it is already included (and then 
used by target.c).


* * *

(B) RFC: The *stream* *creation* (hsa_queue_t, 
cudaStream_t/hipStream_t) functions have tons of options. Thus:

...

(ii) Should the user be able to tweak the values?

I mean, the user could say:** 'prefer_type({fr("cuda"), 
attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'.


Do we want to permit this? If yes, which of the values should be 
changeable?


Is there any prior art for this? It looks like it could be added in 
future, without breaking backward compatibility, so I say "no" (at 
least for now).


There is no real prior art as the 'attr' is a very new feature (voted in 
in the about two months ago); I think it was mainly proposed for 'sycl' 
to specify an 'in-order' queue, which is a commonly what needed, but the 
default in sycl is an 'out-of-order' queue. In any case, it seems as if 
they intent to provide either type of queue.


Still, if there is a sensible attribute to set, I think it makes sense 
to actually add it – and 'ompx_gnu_' should avoid interoperability issues.


But as the feature is supported code wise, adding an attribute only 
requires changing two files: The plugin-.c and libgomp.texi, i.e. 
that's simple and quick.


Tobias

[patch] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn

2024-08-24 Thread Tobias Burnus

This patch comes on top of "[patch][v2] libgomp.texi: Document OpenMP's 
Interoperability Routines", 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661314.html


But it documents the code added at "[patch][rfc] libgomp: Add OpenMP 
interop support to nvptx + gcn plugin", 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html


As remarked there: While the code in the plugin should handle the 
advertised foreign runtimes (cuda, cuda_driver, hip, hsa) correctly, it 
has not been extensively been tested and it only becomes real available 
once the 'interop' directive has been implemented in the compiler itself.


Tobias
libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn

libgomp/ChangeLog:

	* libgomp.texi (omp_get_interop_int, omp_get_interop_str,
	omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to
	Offload-Target Specifics.
	(Offload-Target Specifics): Document the supported OpenMP
	interop types.

 libgomp/libgomp.texi | 118 +--
 1 file changed, 114 insertions(+), 4 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index b36b58b6d10..9d76948812a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2980,7 +2980,7 @@ not affect the usage of the function when GCC's @code{omp_lib} module or
 
 @item @emph{See also}:
 @ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}
-@c @ref{Offload-Target Specifics}
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2,
@@ -3026,7 +3026,7 @@ not affect the usage of the function when GCC's @code{omp_lib} module or
 
 @item @emph{See also}:
 @ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}
-@c @ref{Offload-Target Specifics}
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3,
@@ -3071,7 +3071,7 @@ affect the usage of the function when GCC's @code{omp_lib} module or
 
 @item @emph{See also}:
 @ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc}
-@c @ref{Offload-Target Specifics}
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4,
@@ -3155,7 +3155,7 @@ affect the usage of the function when GCC's @code{omp_lib} module or
 
 @item @emph{See also}:
 @ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name}
-@c @ref{Offload-Target Specifics}
+@ref{Offload-Target Specifics}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6,
@@ -6747,6 +6747,10 @@ The following sections present notes on the offload-target specifics
 @node AMD Radeon
 @section AMD Radeon (GCN)
 
+@menu
+* Foreign-runtime support for AMD GPUs::
+@end menu
+
 On the hardware side, there is the hierarchy (fine to coarse):
 @itemize
 @item work item (thread)
@@ -6816,11 +6820,58 @@ The implementation remark:
   pool is exhausted.
 @end itemize
 
+@node Foreign-runtime support for AMD GPUs
+@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs
+
+An interoperability object of OpenMP @code{interop} type can be obtained using
+the @code{interop} directive; supported as foreign runtimes are HIP
+(C++ Heterogeneous-Compute Interface for Portability) and HSA (Heterogeneous
+System Architecture).  If no @code{prefer_type} argument has been specified,
+HIP is used.
+
+The following properties can then be extracted using the @ref{Interoperability
+Routines}.  Each listed property name has an associated named constant,
+consisting of @code{omp_ipr_} followed by the property name.  The following
+table uses ``@emph{int}'', ``@emph{str}'' and ``@emph{ptr}'' to denote the
+routine to be used to obtain the property value.
+
+Available properties for an HIP interop object:
+@multitable @columnfractions .30 .30 .30
+@headitem Property  @tab data type@tab value (if constant)
+@item @code{fr_id}  @tab @samp{omp_interop_fr_t} @emph{(int)} @tab @samp{omp_fr_hip}
+@item @code{fr_name}@tab @samp{const char *} @emph{(str)} @tab @samp{hip}
+@item @code{vendor} @tab @samp{int}  @emph{(int)} @tab @samp{1}
+@item @code{vendor_name}@tab @samp{const char *} @emph{(str)} @tab @samp{amd}
+@item @code{device_num} @tab @samp{int}  @emph{(int)} @tab
+@item @code{platform}   @tab N/A  @tab
+@item @code{device} @tab @samp{hipDevice_t}  @emph{(int)} @tab
+@item @code{device_context} @tab @samp{hipCtx_t} @emph{(ptr)} @tab
+@item @code{targetsync} @tab @samp{hipStream_t}  @emph{(ptr)} @tab
+@end multitable
+
+Available properties for an HSA interop object:
+@multitable @columnfractions .30 .30 .30
+@headitem Property  @tab data type

[patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-08-23 Thread Tobias Burnus

Minor update, mainly because of the 'optional' changes in v3 of the 
patch https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661313.html


The 'optional' affects the omp_get_interop_{int,ptr,str} but also 
omp_target_memcpy_async, omp_target_memcpy_rect_async got a few words.


Additionally, the returned string of omp_get_interop_type_desc is now 
better described (in GCC it is the C/C++ type decl as string or "N/A" or 
NULL). And a couple of notes about calling the routines from inside a 
non-host target region were added.


Tobias Burnus:

Add documentation for OpenMP's interoperability routines.

This obviously, depends on the actual implementation patch, posted at: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html 
(albeit I will post a v2 in a moment).


I am sure there will be comments, suggestions and remarks :-)

Tobias

PS: I am not 100% sure whether adding the implementation detail makes 
sense or not.

Tobiaslibgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

	* libgomp.texi (Interoperability Routines): Add.
	(omp_target_memcpy_async, omp_target_memcpy_rect_async):
	Document that depobj_list may be omitted in C++ and Fortran.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index fe25d879788..b36b58b6d10 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -656,7 +656,7 @@ specification in version 5.2.
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
-@c * Interoperability Routines::
+* Interoperability Routines::
 * Memory Management Routines::
 @c * Tool Control Routine::
 * Environment Display Routine::
@@ -2134,8 +2134,9 @@ to the destination device's @var{dst} address shifted by @var{dst_offset}.
 Task dependence is expressed by passing an array of depend objects to
 @var{depobj_list}, where the number of array elements is passed as
 @var{depobj_count}; if the count is zero, the @var{depobj_list} argument is
-ignored.  The routine returns zero if the copying process has successfully
-been started and non-zero otherwise.
+ignored.  In C++ and Fortran, the @var{depobj_list} argument can also be
+omitted in that case.   The routine returns zero if the copying process has
+successfully been started and non-zero otherwise.
 
 Running this routine in a @code{target} region except on the initial device
 is not supported.
@@ -2255,7 +2256,8 @@ respectively.  The offset per dimension to the first element to be copied is
 given by the @var{dst_offset} and @var{src_offset} arguments.  Task dependence
 is expressed by passing an array of depend objects to @var{depobj_list}, where
 the number of array elements is passed as @var{depobj_count}; if the count is
-zero, the @var{depobj_list} argument is ignored.  The routine
+zero, the @var{depobj_list} argument is ignored.  In C++ and Fortran, the
+@var{depobj_list} argument can also be omitted in that case.  The routine
 returns zero on success and non-zero otherwise.
 
 The OpenMP specification only requires that @var{num_dims} up to three is
@@ -2884,21 +2886,315 @@ event handle that has already been fulfilled is also undefined.
 
 
 
-@c @node Interoperability Routines
-@c @section Interoperability Routines
-@c
-@c Routines to obtain properties from an @code{omp_interop_t} object.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_get_num_interop_properties:: 
-@c * omp_get_interop_int:: 
-@c * omp_get_interop_ptr:: 
-@c * omp_get_interop_str:: 
-@c * omp_get_interop_name:: 
-@c * omp_get_interop_type_desc:: 
-@c * omp_get_interop_rc_desc:: 
-@c @end menu
+@node Interoperability Routines
+@section Interoperability Routines
+
+Routines to obtain properties from an object of OpenMP interop type.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_get_num_interop_properties:: Get the number of implementation-specific properties
+* omp_get_interop_int:: Obtain integer-valued interoperability property
+* omp_get_interop_ptr:: Obtain pointer-valued interoperability property
+* omp_get_interop_str:: Obtain string-valued interoperability property
+* omp_get_interop_name:: Obtain the name of an interop_property value as string
+* omp_get_interop_type_desc:: Obtain type and description to an interop_property
+* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code
+@end menu
+
+
+
+@node omp_get_num_interop_properties
+@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties
+@table @asis
+@item @emph{Description}:
+The @code{omp_get_num_interop_properties} function returns the number of
+implementation-defined interoperability properties available for the passed
+@var{interop}, extending the OpenMP-defined properties.  The available OpenMP
+interop_property-type values range from @code{omp_ipr_first} to the value
+returned by @code{omp_get_num_interop_properties} minus one.
+
+No implementation-defined properties are currently def

Re: [patch][v3] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-23 Thread Tobias Burnus


v3:

Changes:

(A) The 'ret_code' arguments of omp_get_interop_{int,ptr,str} are 
actually 'optional'.


That's something that got lost in at some point between OpenMP 5.2 and 
TR13 (I filed OpenMP spec Issue #4165 for it). When adding it, I noticed 
that two '…_async' function lacked the '= NULL' for C++, permitting to 
omit the argument. — For my C and Fortran testcases, I added a test with 
NULL for C and omitted the argument for Fortran. I also changed the C 
code such that it also compiles with C++ and added a check that the 
omitted argument is handled correctly.


(B) Fixed a few libgomp/target.c issues, which sneaked in due to the wip 
patch for the libgomp plugin patch, posted at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html (among 
others, it also contained some spurious spaces).


Build and regtested on x86-64-gnu-linux (w/o offloading configured).

Any additional comments, suggestions, remarks?

Andre Vehreschild wrote:
[…]
First, Thanks for your comments. However, regarding:

+omp_intptr_t

Do I get this correct, that omp_intptr_t is a pointer to an integer?


No 'intptr_t' is a (signed) integer type which is has (at least) the 
size of a pointer; in Fortran, that's 'integer(c_intptr_t)'. And 
'omp_intptr_t' is just a typedef for 'intptr_t'. [BTW: I don't know why 
'intptr_t' was used and not, e.g., int64_t or just 'int'.]


Tobiaslibgomp: Add interop types and routines to OpenMP's headers and module

This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.

While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.

libgomp/ChangeLog:

	* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
	omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
	* libgomp.map (GOMP_5.1.3): New; add interop routines.
	* omp.h.in: Add interop typedefs, enum and prototypes.
	(__GOMP_DEFAULT_NULL): Define.
	(omp_target_memcpy_async, omp_target_memcpy_rect_async):
	Use it for the optional depend argument.
 	* omp_lib.f90.in: Add paramters and interfaces for interop.
	* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
	-ffree-length-80.
	* target.c (omp_get_num_interop_properties, omp_get_interop_int,
	omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
	omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
	* config/gcn/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* config/nvptx/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* testsuite/libgomp.c-c++-common/interop-routines-1.c: New test.
	* testsuite/libgomp.c-c++-common/interop-routines-2.c: New test.
	* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-3.F: New test.
	* testsuite/libgomp.fortran/interop-routines-4.F: New test.
	* testsuite/libgomp.fortran/interop-routines-5.F: New test.
	* testsuite/libgomp.fortran/interop-routines-6.F: New test.
	* testsuite/libgomp.fortran/interop-routines-7.F90: New test.

 libgomp/config/gcn/target.c| 105 ++
 libgomp/config/nvptx/target.c  | 105 ++
 libgomp/fortran.c  |  41 +++
 libgomp/libgomp.map|  15 +
 libgomp/omp.h.in   |  78 -
 libgomp/omp_lib.f90.in |  99 ++
 libgomp/omp_lib.h.in   | 170 --
 libgomp/target.c   | 110 +++
 .../libgomp.c-c++-common/interop-routines-1.c  | 287 +
 .../libgomp.c-c++-common/interop-routines-2.c  | 354 +
 .../libgomp.fortran/interop-routines-1.F90 | 236 ++
 .../libgomp.fortran/interop-routines-2.F90 |   3 +
 .../testsuite/libgomp.fortran/interop-routines-3.F |   2 +
 .../testsuite/libgomp.fortran/interop-routines-4.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-5.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-6.F |   4 +
 .../libgomp.fortran/interop-routines-7.F90 | 290 +
 17 files changed, 1883 insertions(+), 24 deletions(-)

diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index 9cafea4e2cc..f7fa6aa6396 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -185,3 +185,108 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
   (void) depend;
   __builtin_unreachable ();
 }
+
+

[patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-22 Thread Tobias Burnus

This patch adds OpenMP's interop support to the libgomp plugins (nvptx: 
cuda, cuda_driver, hip; gcn: hip, hsa).*


[The idea is that the user can ask OpenMP to return a foreign-runtime 
handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device number – 
and to create a stream (CUstream, hipStream_t, cudaStream_t, 
hsa_queue_t), where OpenMP can take care of dependencies, .e.g, via the 
'depend' clause.]


The attached patch comes on top of the interop routine patch, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661118.html (and 
the associated .texi patch, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661072.html ).


The patch is more a WIP/RFC patch than a final patch as it is currently 
not wired up: while 'GOMP_interop' can be called manually, the proper 
way will be OpenMP's 'interop' directive, currently unimplemented. 
Hence, this patch is not extensively tested, does not include testcases, 
and target.c's GOMP_interop will surely change to handle all clauses.


But except that target.c's GOMP_interop will change, the rest of the 
patch should be be rather solid – and could in principle be applied.


Therefore:

(A) Any comments, suggestions regarding the patch in general and in 
particular the plugin/ related parts?


(B) RFC: The *stream* *creation* (hsa_queue_t, cudaStream_t/hipStream_t) 
functions have tons of options. Thus:


(i) Does the chosen size/flags argument for the stream/queue generation 
for GCN/HIP/CUDA make sense? – Or are other values that are more sensible?


(ii) Should the user be able to tweak the values?

I mean, the user could say:** 'prefer_type({fr("cuda"), 
attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'.


Do we want to permit this? If yes, which of the values should be changeable?

Tobias

(*) For Nvidia, HIP is just a thin wrapper of defines, typedefs and 
inline functions around CUDA. Thus, hip, cuda and cuda_driver are 
effectively all the same. / The HSA is a new proposal that is currently 
added additional-definition document. (OpenMP spec Issue #4023.)


(**) The used syntax and in particular 'attr' are new in OpenMP 6.0 (new 
in TR13). Note that attr only takes string literals [while 'fr' takes 
strings and (6.0) identifiers ["omp_ifr_cuda"] or constant integer 
expressions (5.1)].
libgomp: Add OpenMP interop support to nvptx + gcn plugin

FIXME/NOTE: target.c's GOMP_interop is a stub, sufficient for some initial
testing, but not sufficient to implemement 'omp interop'. However, the
plugin side should be feature complete, except for possible extensions.

This adds interop support to the libgomp plugins; to the gcn one, it adds
HSA and HIP and, to the nvptx one, it adds CUDA, CUDA_DRIVER and HIP.

libgomp/ChangeLog:

	* libgomp-plugin.h: Include 'omp.h.in' if _LIBGOMP_PLUGIN_INCLUDE
	is set; define the following only if _LIBGOMP_OMP_LOCK_DEFINED is
	set (either via libgomp.h or when _LIBGOMP_PLUGIN_INCLUDE is set).
	(struct interop_obj_t): New.
	(GOMP_OFFLOAD_get_interop, GOMP_OFFLOAD_get_interop_int,
	GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str,
	GOMP_OFFLOAD_get_interop_type_desc): Add prototype.
	* libgomp.h: Move 'omp.h.in' inclusion to the top. 
	(struct gomp_device_descr): Add function pointers for interop.
	* libgomp.map (GOMP_5.1.3): Add GOMP_interop.
	* libgomp_g.h (GOMP_interop): Add prototype.
	* target.c (GOMP_get_interop): New.
	(omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str
	omp_get_interop_type_desc): Add calls into the plugin.
	(gomp_load_plugin_for_device): DLSYM_OPT the new plugin functions.
	* plugin/plugin-gcn.c (_LIBGOMP_PLUGIN_INCLUDE):
	(hipError_t, hipCtx_t, hipStream): Add stub typedefs.
	(struct hip_runtime_fn_info): New.
	(struct agent_info): Add hsa_device_num.
	(hip_fns, hip_runtime_lib): New global vars.
	(init_environment_variables): Init hip_runtime_lib.
	(struct agent_id_data_t): New.
	(assign_agent_ids): Use it to set hsa_device_num.
	(init_hsa_context): Update call.
	(init_hip_runtime_functions, GOMP_OFFLOAD_interop,
	GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr,
	GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): New.
	* plugin/plugin-nvptx.c: Define _LIBGOMP_PLUGIN_INCLUDE before
	including libgomp-plugin.h.
	(GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int,
	GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str,
	GOMP_OFFLOAD_get_interop_type_desc): New.

 libgomp/libgomp-plugin.h  |  37 
 libgomp/libgomp.h |  17 +-
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp_g.h   |   2 +
 libgomp/plugin/plugin-gcn.c   | 415 +-
 libgomp/plugin/plugin-nvptx.c | 282 
 libgomp/target.c  | 134 +++---
 7 files changed, 848 insertions(+), 40 deletions(-)

diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 0c9c28c65cf..ce1a83bc51e 100644
--- a/libgomp/libgomp-plugin.h
+++ b/lib

[patch][v2a] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-22 Thread Tobias Burnus

This is nearly identical to v2, except that I presumably used 'git add 
testsuite' when intending to use 'git add -u testsuite' in a last-minute 
change as it contained a bunch of unrelated test files …


The only other change besides removing unrelated files  is that for the 
generic part of omp_get_interop_type_desc, the data types ('int' for 
fr_id, vendor, device_num; const char*' for fr_name, vendor_name) are 
now returned in target.c while the specific types (for device, 
device_context, targetsync platform) will eventually be handled by the 
plugin function.


Tobias

Am 21.08.24 um 20:27 schrieb Tobias Burnus:
Nearly identical to v1, except that I realized that OpenMP permits to 
call those functions also from target regions.


Hence, those also got those functions, including a use of 
omp_irc_other to make clear why it will fail …


In addition, two (nonhost) target-region test files were added.

Comments, remarks, suggestions before I commit it?

Otherwise, the following still applies:
This patch adds 'interop' to C/C++'s omp.h and Fortran's omp_lib.h 
and omp_lib module.


The implementation should match OpenMP 5.1 (which added interop) and 
also TR13; the Fortran routine support is new in TR13. It also adds 
'hsa' as foreign object enum/paramter, which is currently being added 
to the additional-definitions document.


* * *

The routine interface does not exactly match the OpenMP spec as some 
VALUE and BIND(C) and one c_int has been used to reduce pointless 
differences between OpenMP and C/C++.


This shouldn't affect the usage as almost no user will worries about 
the API used for a procedure reference. But if a user defines the 
routine interface him-/herself, this will fail. (But why should 
(s)he? There is 'omp_lib.h' and the 'omp_lib' module, after all – and 
several items in those files are implementation defined.)


On the C/C++ side, there are also some differences (at least with 
TR13) with regards to unsigned vs. signed and to enum (of size 
__UINTPTR_T__) vs. 'typdef (u)intptr_t', but they shouldn't matter 
either (effectively same API) – and, again, there is a omp.h, which 
any sensible user should use.


* * *

While there is a stub implementation for the routines, to make them 
really useful, two things are missing: Support for the 'interop' 
directive in the compiler itself (+ a libgomp function for it) and 
supporting some foreign run time types in the libgomp plugin. Also 
missing is the documentation of the added routines in libgomp.texi. 
All of which will be added in later patches.


Build + tested on x86-64-gnu-linux (with offloading enabled but 
that's not yet relevant). 


Cheers,

Tobiaslibgomp: Add interop types and routines to OpenMP's headers and module

This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.

While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.

libgomp/ChangeLog:

	* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
	omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
	* libgomp.map (GOMP_5.1.3): New; add interop routines.
	* omp.h.in: Add interop typedefs, enum and prototypes.
 	* omp_lib.f90.in: Add paramters and interfaces for interop.
	* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
	-ffree-length-80.
	* target.c (omp_get_num_interop_properties, omp_get_interop_int,
	omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
	omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
	* config/gcn/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* config/nvptx/target.c (omp_get_num_interop_properties,
	omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
	omp_get_interop_name, omp_get_interop_type_desc,
	omp_get_interop_rc_desc): Add.
	* testsuite/libgomp.c/interop-routines-1.c: New test.
	* testsuite/libgomp.c/interop-routines-2.c: New test.
	* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-3.F: New test.
	* testsuite/libgomp.fortran/interop-routines-4.F: New test.
	* testsuite/libgomp.fortran/interop-routines-5.F: New test.
	* testsuite/libgomp.fortran/interop-routines-6.F: New test.
	* testsuite/libgomp.fortran/interop-routines-7.F90: New test.

 libgomp/config/gcn/target.c|  99 +++
 libgomp/config/nvptx/target.c  |  99 +++
 libgomp/fortran.c  |

[patch] libgomp.texi: Document OpenMP's Interoperability Routines

2024-08-21 Thread Tobias Burnus


Add documentation for OpenMP's interoperability routines.

This obviously, depends on the actual implementation patch, posted at: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html 
(albeit I will post a v2 in a moment).


I am sure there will be comments, suggestions and remarks :-)

Tobias

PS: I am not 100% sure whether adding the implementation detail makes 
sense or not.
libgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

	* libgomp.texi (Interoperability Routines): Add.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index fe25d879788..ecc60882d72 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -656,7 +656,7 @@ specification in version 5.2.
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
-@c * Interoperability Routines::
+* Interoperability Routines::
 * Memory Management Routines::
 @c * Tool Control Routine::
 * Environment Display Routine::
@@ -2884,21 +2884,294 @@ event handle that has already been fulfilled is also undefined.
 
 
 
-@c @node Interoperability Routines
-@c @section Interoperability Routines
-@c
-@c Routines to obtain properties from an @code{omp_interop_t} object.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_get_num_interop_properties:: 
-@c * omp_get_interop_int:: 
-@c * omp_get_interop_ptr:: 
-@c * omp_get_interop_str:: 
-@c * omp_get_interop_name:: 
-@c * omp_get_interop_type_desc:: 
-@c * omp_get_interop_rc_desc:: 
-@c @end menu
+@node Interoperability Routines
+@section Interoperability Routines
+
+Routines to obtain properties from an object of OpenMP interop type.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_get_num_interop_properties:: Get the number of implementation-specific properties
+* omp_get_interop_int:: Obtain integer-valued interoperability property
+* omp_get_interop_ptr:: Obtain pointer-valued interoperability property
+* omp_get_interop_str:: Obtain string-valued interoperability property
+* omp_get_interop_name:: Obtain the name of an interop_property value as string
+* omp_get_interop_type_desc:: Obtain type and description to an interop_property
+* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code
+@end menu
+
+
+
+@node omp_get_num_interop_properties
+@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties
+@table @asis
+@item @emph{Description}:
+The @code{omp_get_num_interop_properties} function returns the number of
+implementation-defined interoperability properties available for the passed
+@var{interop}, extending the OpenMP-defined properties.  The available OpenMP
+interop_property-type values range from @code{omp_ipr_first} to the value
+returned by @code{omp_get_num_interop_properties} minus one.
+
+No implementation-defined properties are currently defined in GCC.
+
+Implementation remark: In GCC, the Fortran interface differs from the one shown
+below: the function has C binding, @var{interop} is passed by value and an
+integer of @code{c_int} kind is returnd, permitting to have the same ABI as the
+C function.  This does not affect the usage of the function when GCC's
+@code{omp_lib} module or @code{omp_lib.h} header is used.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_num_interop_properties(const omp_interop_t interop)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_num_interop_properties(interop)}
+@item   @tab @code{integer(omp_interop_kind), intent(in) :: interop}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_interop_name}, @ref{omp_get_interop_type_desc}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.1,
+@uref{https://www.openmp.org, OpenMP specification TR13}, Section 26.1
+@end table
+
+
+
+@node omp_get_interop_int
+@subsection @code{omp_get_interop_int} -- Obtain integer-valued interoperability property
+@table @asis
+@item @emph{Description}:
+The @code{omp_get_interop_int} function returns the integer value associated
+with the @var{property_id} interoperability property of the passed @var{interop}
+object.  If successful, @var{ret_code} is set to @code{omp_irc_success}.
+
+Implementation remark: In GCC, the Fortran interface differs from the one shown
+below: the function has C binding and @var{interop} and @var{property_id} are
+passed by value, permitting to have the same ABI as the C function.  This does
+not affect the usage of the function when GCC's @code{omp_lib} module or
+@code{omp_lib.h} header is used.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop,
+   omp_interop_property_t property_id, int *ret_code)}
+@end multitable
+
+@item @emph{Fortran

[patch] libgomp: Add interop types and routines to OpenMP's headers and module

2024-08-21 Thread Tobias Burnus

This patch adds 'interop' to C/C++'s omp.h and Fortran's omp_lib.h and 
omp_lib module.


The implementation should match OpenMP 5.1 (which added interop) and 
also TR13; the Fortran routine support is new in TR13. It also adds 
'hsa' as foreign object enum/paramter, which is currently being added to 
the additional-definitions document.


* * *

The routine interface does not exactly match the OpenMP spec as some 
VALUE and BIND(C) and one c_int has been used to reduce pointless 
differences between OpenMP and C/C++.


This shouldn't affect the usage as almost no user will worries about the 
API used for a procedure reference. But if a user defines the routine 
interface him-/herself, this will fail. (But why should (s)he? There is 
'omp_lib.h' and the 'omp_lib' module, after all – and several items in 
those files are implementation defined.)


On the C/C++ side, there are also some differences (at least with TR13) 
with regards to unsigned vs. signed and to enum (of size __UINTPTR_T__) 
vs. 'typdef (u)intptr_t', but they shouldn't matter either (effectively 
same API) – and, again, there is a omp.h, which any sensible user should 
use.


* * *

While there is a stub implementation for the routines, to make them 
really useful, two things are missing: Support for the 'interop' 
directive in the compiler itself (+ a libgomp function for it) and 
supporting some foreign run time types in the libgomp plugin. Also 
missing is the documentation of the added routines in libgomp.texi. All 
of which will be added in later patches.


Build + tested on x86-64-gnu-linux (with offloading enabled but that's 
not yet relevant).


Comments, remarks, suggestions before I commit it?

Tobias
libgomp: Add interop types and routines to OpenMP's headers and module

This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.

While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.

libgomp/ChangeLog:

	* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
	omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
	* libgomp.map (GOMP_5.1.3): New; add interop routines.
	* omp.h.in: Add interop typedefs, enum and prototypes.
 	* omp_lib.f90.in: Add paramters and interfaces for interop.
	* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
	-ffree-length-80.
	* target.c (omp_get_num_interop_properties, omp_get_interop_int,
	omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
	omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
	* testsuite/libgomp.c/interop-routines-1.c: New test.
	* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
	* testsuite/libgomp.fortran/interop-routines-3.F: New test.
	* testsuite/libgomp.fortran/interop-routines-4.F: New test.
	* testsuite/libgomp.fortran/interop-routines-5.F: New test.
	* testsuite/libgomp.fortran/interop-routines-6.F: New test.

 libgomp/fortran.c  |  41 
 libgomp/libgomp.map|  15 ++
 libgomp/omp.h.in   |  69 ++
 libgomp/omp_lib.f90.in |  99 +
 libgomp/omp_lib.h.in   | 167 --
 libgomp/target.c   |  91 
 libgomp/testsuite/libgomp.c/interop-routines-1.c   | 246 +
 .../libgomp.fortran/interop-routines-1.F90 | 222 +++
 .../libgomp.fortran/interop-routines-2.F90 |   3 +
 .../testsuite/libgomp.fortran/interop-routines-3.F |   2 +
 .../testsuite/libgomp.fortran/interop-routines-4.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-5.F |   4 +
 .../testsuite/libgomp.fortran/interop-routines-6.F |   4 +
 13 files changed, 945 insertions(+), 22 deletions(-)

diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index cfbea32b022..b62a3f29916 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -102,6 +102,10 @@ ialias_redirect (omp_set_default_allocator)
 ialias_redirect (omp_get_default_allocator)
 ialias_redirect (omp_display_env)
 ialias_redirect (omp_fulfill_event)
+ialias_redirect (omp_get_interop_str)
+ialias_redirect (omp_get_interop_name)
+ialias_redirect (omp_get_interop_type_desc)
+ialias_redirect (omp_get_interop_rc_desc)
 #endif
 
 #ifndef LIBGOMP_GNU_SYMBOL_VERSIONING
@@ -807,4 +811,41 @@ omp_display_env_8_ (const int64_t *verbose)
   omp_display_env (!!*verbose);
 }
 
+void
+omp_get_interop_str_ (const char **res, size_t *res_len,
+		  const omp_interop_t interop,
+		  omp_interop_property_t property_id,
+		  omp_interop_rc_t *ret_code)
+{
+  *res = omp_get_interop_str (interop, property_id, ret_code);
+  *res_len = *res ? strlen (*res) : 0;
+}
+
+void
+omp_get_inter

Re: [PATCH v3 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-08-09 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `gomp_get_mapped_ptr` for the adequate device.


...


* gimplify.h (omp_has_novariants): Declare.
(omp_has_nocontext): Declare.


As those two functions are only used in gimplify.cc,
please make them 'static' and remove them from gimplify.h.

* * *

I have a testcase which is rejected with the bogus:

   17 |   !$omp end dispatch
  | 1
Error: Unclassifiable OpenMP directive at (1)

That's at least valid in OpenMP 6.0 previews as those have:
  "For a dispatch directive, the paired 'end' directive is optional."

In 5.2, it is implied via "3.1 Directive Format" and that 'dispatch'
has "Association: block (function dispatch structured block)"

Note: That 'nowait' is an 'end-clause' and may also appear as
'!$omp end dispatch nowait'.
(but either at 'dispatch' or at 'end dispatch'; the current code should 
be able to handle this.)


* * *

But the main reason that I created the testcase was a comment which 
looked wrong in gimplify_omp_dispatch – and indeed, the attached

testcase gives an ICE:

internal compiler error: in gimplify_omp_dispatch, at gimplify.cc:18064

See attached Fortran testcase + comment below at gimplify_omp_dispatch.

* * *


--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc


...


@@ -4052,6 +4053,7 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool 
want_value)
/* Gimplify the function arguments.  */
if (nargs > 0)
  {
+tree device_num = NULL_TREE;


Indentation issue: Indented by 4 instead of 6 spaces.


@@ -4062,8 +4064,111 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, 
bool want_value)


...


+ if (flag_openmp && EXPR_P (CALL_EXPR_FN (*expr_p))
+ && DECL_P (TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0))
+ && (adjust_args_list = lookup_attribute (
+   "omp declare variant variant adjust_args",
+   DECL_ATTRIBUTES (
+ TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0
+  != NULL_TREE)
+   {

...

+ if (gimplify_omp_ctxp != NULL
+ && gimplify_omp_ctxp->code == OMP_DISPATCH)
+   {


The OpenMP spec only supports append_args/adjust_args "when a specified
function variant is selected for replacement in the context of a
function *dispatch* structured block.

Thus, IMHO, you can merge the two if conditions.



+ for (tree c = gimplify_omp_ctxp->clauses; c;
+  c = TREE_CHAIN (c))
+   {
+ if (OMP_CLAUSE_CODE (c)
+ == OMP_CLAUSE_IS_DEVICE_PTR)
+   {
+ tree decl1 = DECL_NAME (OMP_CLAUSE_DECL (c));
+ tree decl2
+   = tree_strip_nop_conversions (*arg_p);
+ if (TREE_CODE (decl2) == ADDR_EXPR)
+   decl2 = TREE_OPERAND (decl2, 0);
+ gcc_assert (TREE_CODE (decl2) == VAR_DECL
+ || TREE_CODE (decl2)
+  == PARM_DECL);
+ decl2 = DECL_NAME (decl2);
+ if (decl1 == decl2)
+   {
+ is_device_ptr = true;
+ break;
+   }
+   }
+ else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE)
+   device_num = OMP_CLAUSE_OPERAND (c, 0);
+   }


Assume(*) you have:

#pragma omp dispatch is_device_ptr(p) device_num(6)
  foo(p);

If I read the code correctly, this will use the default device as the 
"break" will prevent finding the device clause.


(* Or other way round, if new clauses are internally added at the 
beginning of the list.)




+ if (build_int_cst (integer_type_node, i)
+ == TREE_VALUE (arg))


I think

if (wi::eq_p (i, tree_strip_any_location_wrapper (
TREE_VALUE (arg)))

is better and avoids creating new tree values that might en up being 
unused. (I am assuming that TREE_CODE(TREE_VALUE (arg)) == INTEGER_CST, 
if not, some additional checks might be needed.)


(The tree_strip_any_location_wrapper call, I have taken from
integer_nonzerop (etc.) and it might not be needed.)


+ if (need_device_ptr && !is_device_ptr)
+   {
+ if (device_nu

Re: [PATCH v3 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-08-08 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.


LGTM - thanks!

Tobias


gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
---
  gcc/builtin-types.def|  1 +
  gcc/fortran/types.def|  1 +
  gcc/omp-selectors.h  |  1 +
  gcc/tree-core.h  |  7 +++
  gcc/tree-pretty-print.cc | 21 +
  gcc/tree.cc  |  4 
  gcc/tree.def |  5 +
  gcc/tree.h   |  7 +++
  8 files changed, 47 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
  DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
  DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, 
BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)

  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)

diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 390cc9542f7..5047c8f816a 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -120,6 +120,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, 
BT_BOOL)
  DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 BT_VOID, BT_PTR, BT_PTRMODE)
  DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, 
BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)

  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)

diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h
index c61808ec0ad..ef3ce9a449a 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -55,6 +55,7 @@ enum omp_ts_code {
OMP_TRAIT_CONSTRUCT_PARALLEL,
OMP_TRAIT_CONSTRUCT_FOR,
OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
OMP_TRAIT_LAST,
OMP_TRAIT_INVALID = -1
  };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {

/* OpenACC clause: nohost.  */
OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
  };

  #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
  case OMP_CLAUSE_EXCLUSIVE:
name = "exclusive";
goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
  case OMP_CLAUSE__LOOPTEMP_:
name = "_looptemp_";
goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags);
goto dump_omp_body;

+case OMP_DISPATCH:
+  pp_string (pp, "#pragma omp dispatch");
+  dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags);
+  goto dump_omp_body;
+
  case OMP_SECTION:
pp_string (pp, "#pra

[Patch] libgomp.texi: Update implementation status table for OpenMP TR13

2024-08-08 Thread Tobias Burnus

Update for the very recently released TR13. Unsurprisingly, most item 
are still unimplemented.


→ https://www.openmp.org/specifications/ → Technical Report 13

Comments, suggestions, typo fixes? — If not, I will commit it later today.

Tobias
libgomp.texi: Update implementation status table for OpenMP TR13

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Technical Report 13): Renamed from
	'OpenMP Technical Report 12'; updated for TR13 changes.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index c6759dd03bc..96cc0e4baa8 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -169,7 +169,7 @@ See also @ref{OpenMP Implementation Status}.
 * OpenMP 5.0:: Feature completion status to 5.0 specification
 * OpenMP 5.1:: Feature completion status to 5.1 specification
 * OpenMP 5.2:: Feature completion status to 5.2 specification
-* OpenMP Technical Report 12:: Feature completion status to second 6.0 preview
+* OpenMP Technical Report 13:: Feature completion status to third 6.0 preview
 @end menu
 
 The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
@@ -391,7 +391,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item @code{destroy} clause with destroy-var argument on @code{depobj}
   @tab Y @tab
 @item Deprecation of no-argument @code{destroy} clause on @code{depobj}
-  @tab N @tab
+  @tab N/A @tab undeprecated in OpenMP 6
 @item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
 @item Deprecation of minus operator for reductions @tab N @tab
 @item Deprecation of separating @code{map} modifiers without comma @tab N @tab
@@ -448,20 +448,24 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @end multitable
 
 
-@node OpenMP Technical Report 12
-@section OpenMP Technical Report 12
+@node OpenMP Technical Report 13
+@section OpenMP Technical Report 13
 
-Technical Report (TR) 12 is the second preview for OpenMP 6.0.
+Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 
 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
 @multitable @columnfractions .60 .10 .25
-@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
+@item Features deprecated in versions 5.0, 5.1 and 5.2 were removed
   @tab N/A @tab Backward compatibility
 @item Full support for C23 was added @tab P @tab
 @item Full support for C++23 was added @tab P @tab
+@item Full support for Fortran 2023 was added @tab P @tab
 @item @code{_ALL} suffix to the device-scope environment variables
   @tab P @tab Host device number wrongly accepted
 @item @code{num_threads} now accepts a list @tab N @tab
+@item Abstract names added for @code{OMP_NUM_THREADS},
+  @code{OMP_THREAD_LIMIT} and @code{OMP_TEAMS_THREAD_LIMIT}
+  @tab N @tab
 @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
 @item Extension of @code{OMP_DEFAULT_DEVICE} and new
   @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
@@ -470,28 +474,51 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0.
   @tab Y @tab
 @item The OpenMP directive syntax was extended to include C 23 attribute
   specifiers @tab Y @tab
+@item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab
 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
 @item For Fortran, @emph{locator list} can be also function reference with
   data pointer result @tab N @tab
 @item Concept of @emph{assumed-size arrays} in C and C++
   @tab N @tab
 @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
+@item Argument-free version of @code{depobj} including added @code{init} clause
+  @tab N @tab
+@item Undeprecate omitting the argument to the @code{depend} clause of
+  the argument version of the @code{depend} construct @tab Y @tab
 @item For Fortran, atomic with BLOCK construct and, for C/C++, with
   unlimited curly braces supported @tab N @tab
+@item For Fortran, atomic with pointer comparison @tab N @tab
+@item For Fortran, atomic with enum and enumeration types @tab N @tab
 @item For Fortran, atomic compare with storing the comparison result
   @tab N @tab
 @item New @code{looprange} clause @tab N @tab
-@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
+@item For Fortran, handling polymorphic types in data-sharing-attribute
+  clauses @tab P @tab @code{private} not supported
+@item For Fortran, rejecting polymorphic types in data-mapping clauses
+  @tab N @tab not diagnosed (and mostly unsupported)
+@item New @code{taskgraph} construct including @emph{saved} modifier and
+  @code{replayable} clause @tab N @tab
+@item @code{default} clause on the @code{target} directive @tab N @tab
+@item Ref-count change for @code{use_device_ptr} and @code{use_device_addr}
   @tab N @tab
 @item Support for inductions @tab N @tab
+@item Deprecation of the combiner expressio

[Patch] libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device (was: Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates)

2024-08-08 Thread Tobias Burnus


Document  -fno-builtin-omp_is_initial_device as discussed:

Jakub Jelinek wrote:

RFC: Should be document this new built-in some where? If so, where? As part
of the routine description in libgomp.texi? Or in extend.texi (or even
invoke.texi)?

I think libgomp.texi in the omp_is_initial_device description, mention
that the compiler folds it by default and that if that is undesirable,
there is this option to use.


Unless there are wording suggestions, I will commit it later today.

Tobias
libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device

libgomp/ChangeLog:

	* libgomp.texi (omp_is_initial_device): Mention
	-fno-builtin-omp_is_initial_device and folding by default.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index c6759dd03bc..96cc0e4baa8 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1754,6 +1754,10 @@ This function returns @code{true} if currently running on the host device,
 @code{false} otherwise.  Here, @code{true} and @code{false} represent
 their language-specific counterparts.
 
+Note that in GCC this value is already folded to a constant in the compiler;
+compile with @option{-fno-builtin-omp_is_initial_device} if a run-time function
+is desired.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}

[committed] libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump (was: [r15-2799 Regression] FAIL: libgomp.c++/static-aggr-constructor-destructor-2.C scan-tree-dump-times optimiz

2024-08-08 Thread Tobias Burnus


haochen.jiang wrote:

FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized 
"__attribute__\\(\\([^\n\r]*omp declare target nohost" 1
FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized 
"void _GLOBAL__off_I_v1" 1


Those symbols are generated even with ENABLE_OFFLOADING == false, but in 
that case they are optimized way (as they should).


With offloading, the pass removing them comes too late, but we should 
handle 'nohost' explicitly. Once done, the dump will be the same (no 
symbol). Until this implemented, we now do:


To make this test pass, we now use 'target (!) offload_target_any' to 
separate the cases, even though offload_target_any does not completely 
match ENABLE_OFFLOADING.*


Committed as r15-2814-ge3a6dec326a127

Tobias

(* If you configured with --enable-offload-defaulted and have no offload 
binaries available or when you smuggle '-foffload=disable' to the 
commandline, ENABLE_OFFLOADING is true while offload_target_any is false.)
commit e3a6dec326a127ad549246435b9d3835e9a32407
Author: Tobias Burnus 
Date:   Thu Aug 8 10:42:25 2024 +0200

libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump

In principle, the optimized dump should be the same on the host, but as
'nohost' is not handled, is is present. However when ENABLE_OFFLOADING is
false, it is handled early enough to remove the function.

libgomp/ChangeLog:

* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: Split
scan-tree-dump into with and without target offload_target_any.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
Likewise.
---
 .../libgomp.c++/static-aggr-constructor-destructor-1.C   | 15 ---
 .../libgomp.c++/static-aggr-constructor-destructor-2.C   | 16 +---
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
index 403a071c0c0..b5aafc8cabc 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
@@ -9,9 +9,18 @@
 
 // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } }
 // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } }
-// FIXME: should be '-not' not '-times' 1:
-// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" } }
-// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" } }
+
+// (A) No offloading configured: The symbols aren't present
+// Caveat: They are present with -foffload=disable - or offloading
+// configured but none of the optional offload packages/binaries installed.
+// But the 'offload_target_any' check cannot distinguish those
+// { dg-final { scan-tree-dump-not "void _GLOBAL__off_I_v1" "optimized" { target { ! offload_target_any } } } }
+// { dg-final { scan-tree-dump-not "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" "optimized" { target { ! offload_target_any } } } }
+
+// (B) With offload configured (and compiling for an offload target)
+// the symbols are present (missed optimization). Hence: FIXME.
+// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" { target offload_target_any } } }
+// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" { target offload_target_any } } }
 
 // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump-not "omp_initial_device;" "optimized" { target offload_target_amdgcn } } }
 // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "v1\\._x = 5;" "optimized" { target offload_target_amdgcn } } }
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
index 6dd4260a522..9652a721bbe 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
@@ -9,9 +9,19 @@
 
 // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } }
 // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } }
-// FIXME: should be '-not' not '-times' 1:
-// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_" 1 "optimized" } }
-// { dg-final { sca

[committed] libgomp.c-c++-common/target-link-2.c: Fix test on multi-device systems (was: Re: [Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107])

2024-08-07 Thread Tobias Burnus


Hi Thomas,

Thomas Schwinge wrote:

The new test case 'libgomp.c-c++-common/target-link-2.c' generally PASSes
on one-GPU systems, but on a multi-GPU system (tested nvidia5):


After having debugged it, it became glaringly obvious, but could 
otherwise be missed …


The testcase checks that mapping an array – and then remapping a 
different stride works, but to see that it was really remapped, it 
changed the host value before.


The issue was that it has to be changed back to the original value for 
the next device as the value checks expect always the same value.


Committed as r15-2796-gaa689684d2bf58.

Thanks for the report!

Tobias

PS:


I first thought that maybe just:

 +  #pragma omp target exit data map(release:arr[3:10]) device(dev)
I was (and still am) torn between adding it (cleaner) or keeping it, as 
both have some merits for testing - and haven't cleaned up after the 
remapping. In any case, either testcase is fine (and should work).commit aa689684d2bf58d1b7e7938a1392e7a260276d14
Author: Tobias Burnus 
Date:   Wed Aug 7 17:59:21 2024 +0200

libgomp.c-c++-common/target-link-2.c: Fix test on multi-device systems

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/target-link-2.c: Reset variable
value to handle multi-device tests.
---
 libgomp/testsuite/libgomp.c-c++-common/target-link-2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
index 15da1656ebf..b64fbde70e3 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -54,6 +54,9 @@ int main()
   for (int i = 0; i < 10; i++)
 	if (res[i] != (4 + i)*10)
 	  __builtin_abort ();
+
+  for (int i = 0; i < 15; i++) /* Reset. */
+	arr[i] /= 10;
 }
   return 0;
 }

Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates

2024-08-07 Thread Tobias Burnus


Hi Jakub,

for C/C++, -fno-builtin-omp_is_initial_device already disabled the 
expansion.


I added it also for Fortran. Plus added a C and a Fortran testcase for 
the disable flag.


* * *

Wording wise, it failed before for Fortran with:

f951: Warning: command-line option ‘-fno-builtin-omp_is_initial_device’ 
is valid for C/C++/ObjC/ObjC++ but not for Fortran


f951: Warning: command-line option ‘-fbuiltin-omp_is_initial_device’ is 
valid for C/C++/ObjC/ObjC++ but not for Fortran


(The latter is not quite true as all non "no-" ones are rejected for 
C/C++, e.g.: "cc1: error: unrecognized command-line option 
‘-fbuiltin-omp_is_initial_device’").


Now all positive forms fail with: "f951: Error: unrecognized 
command-line option ‘-fbuiltin-omp_is_initial_device’", which should be 
fine and in line with C/C++.


[RFC] The only real question is how to handle unknown -fno-builtin-* 
flags. C/C++ accepts them silently; Fortran did reject them before (see 
above) as unknown flag. And this patch does:


f951: Warning: command-line option ‘-fno-builtin-nothing’ is not valid 
for Fortran


for all but that single supported flag.

* * *

Jakub Jelinek wrote:

As I wrote, I think there should be some option to override the
omp_is_initial_device folding, e.g. for the case where one is compiling some
library code which could be linked either way and so need to avoid folding
omp_is_initial_device because we'll only know at runtime.


Now done – already there for C/C++, but required the changes for Fortran.

RFC: Should be document this new built-in some where? If so, where? As 
part of the routine description in libgomp.texi? Or in extend.texi (or 
even invoke.texi)?



Maybe would be worth testing that omp_is_initial_device is not treated like
a builtin in C++ in custom namespace, or as a static or non-static member
function, or for C or Fortran as nested function.


For C/C++, it uses the same mechanism (both_p = true) as all other 
builtins; thus, I just hope that it works there.


For Fortran, this plus into gfc_get_extern_function_decl, i.e. that name 
appears as external declaration. While the user could mess around, it 
checks that it is a function and the return type is the expected on 
(i.e. logical). Thus, there shouldn't be any issue with nested functions.


Tobias
OpenMP: Constructors and destructors for "declare target" static aggregates

This commit also compile-time expands (__builtin_)omp_is_initial_device for
both Fortran and C/C++ (unless, -fno-builtin-omp_is_initial_device is used).
But the main change is:

This commit adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

Before this commit, space is allocated on the target for such aggregates,
but nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-08-07  Julian Brown  
	Tobias Burnus  

gcc/ChangeLog:

	* builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define
	DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version.
	* gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New.
	(gimple_fold_builtin): Call it.
	* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define.
	* tree.cc (get_file_function_name): Support names for on-target
	constructor/destructor functions.

gcc/cp/
	* decl2.cc (tree-inline.h): Include.
	(static_init_fini_fns): Bump to four entries. Update comment.
	(start_objects, start_partial_init_fini_fn): Add 'omp_target'
	parameter. Support "declare target" decls. Update forward declaration.
	(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
	the created function. Support "declare target".
	(OMP_SSDF_IDENTIFIER): New macro.
	(partition_vars_for_init_fini): Support partitioning "declare target"
	variables also.
	(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
	"declare target" decls.
	(c_parse_final_cleanups): Support constructors/destructors on OpenMP
	offload targets.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_option_t): Add disable_omp_is_initial_device.
	* lang.opt (fbuiltin-): Add.
	* options.cc (gfc_handle_option): Handle
	-fno-builtin-omp_is_initial_device.
	* f95-lang.cc (gfc_init_builtin_functions): Handle
	DEF_GOMP_BUILTIN_COMPILER.
	* trans-decl.cc (gfc_get_extern_function_decl): Add code to use
	DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor

[PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates

2024-08-07 Thread Tobias Burnus


CCed Fortran because of the first item:

This patch now uses (again like in v1) a builtin for 
'omp_is_initial_device'; like in v2, it is compile-time evaluated, but 
this time (new!) it also handled the case that a user wrote that routine.


Note: The omp_… namespace is owned by OpenMP, i.e. if it breaks for a 
user-defined function (when compiled with -fopenmp), it's the fault of 
the user.


Otherwise, it is unchanged except for the following first suggestion. 
And while 'nohost' should be optimized (away on the host), that's 
deferred to a to-be-written follow-up patch.


On Aug 1, 2024, Jakub Jelinek wrote:

On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote:

-  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
+  tree name;
...

I'd just use a single buffer here,
   char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER))
  + 1 /* \0 */ + 32];

Done as proposed.

Given that the Xeon PHI offloading is gone and fork offloading doesn't seem
to be worked on, my preference would be
__builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because
that will actually help user code too.

Done.

And of course, it would be much better to figure out real nohost fix,
because if we need to register a constructor which will just do nothing, it
still wastes runtime.


To be done in a follow-up patch.

Comments, suggestions, concerns?

Tobias

PS: In principle, 'omp_get_num_devices()' would be a candidate for 
'-foffload=disable' (or not configured), but I am not sure how useful it 
is, especially as the decision whether offloading should be done is 
deferred to the link time.


PPS: For OpenACC, there is already an optimization for the similar but 
more complex acc_on_device. But that one doesn't handle Fortran due to 
the different ABI. See https://gcc.gnu.org/PR116269 for details.
OpenMP: Constructors and destructors for "declare target" static aggregates

This commit also compile-time expands (__builtin_)omp_is_initial_device for
both Fortran and C/C++. But the main change is:

This commit adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

Before this commit, space is allocated on the target for such aggregates,
but nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-08-07  Julian Brown  
	Tobias Burnus  

gcc/ChangeLog:

	* builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define
	DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version.
	* gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New.
	(gimple_fold_builtin): Call it.
	* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define.
	* tree.cc (get_file_function_name): Support names for on-target
	constructor/destructor functions.

gcc/cp/
	* decl2.cc (tree-inline.h): Include.
	(static_init_fini_fns): Bump to four entries. Update comment.
	(start_objects, start_partial_init_fini_fn): Add 'omp_target'
	parameter. Support "declare target" decls. Update forward declaration.
	(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
	the created function. Support "declare target".
	(OMP_SSDF_IDENTIFIER): New macro.
	(partition_vars_for_init_fini): Support partitioning "declare target"
	variables also.
	(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
	"declare target" decls.
	(c_parse_final_cleanups): Support constructors/destructors on OpenMP
	offload targets.

gcc/fortran/ChangeLog:

	* f95-lang.cc (gfc_init_builtin_functions): Handle
	DEF_GOMP_BUILTIN_COMPILER)
	* trans-decl.cc (gfc_get_extern_function_decl): Add code to use
	DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test.
	* testsuite/libgomp.c-c++-common/target-is-initial-host.c: New test.
	* testsuite/libgomp.fortran/target-is-initial-host.f: New test.
	* testsuite/libgomp.fortran/target-is-initial-host.f90: New test.

Co-authored-by: Tobias Burnus 

 gcc/builtins.def   |   4 +
 gcc/cp/decl2.cc| 229 +
 gcc/fortran/f95-lang.cc|   9 +
 gcc/fortran/trans-decl.cc  |   8 +
 gcc/gimple-fold.cc |  20 ++
 gcc/omp-builtins.def   |   4 +
 gcc/tree

[committed] libgomp.texi: Add OpenMP TR13 routines to @menu (commented out)

2024-08-05 Thread Tobias Burnus

Not user visible but I use this to keep track of both implemented OpenMP 
runtime routines that still have to be documented and of still to be 
implemented (and then documented) routines.


This commit (r15-2713-g1a5734135d265a) adds those routines added in 
OpenMP's third 6.0 preview (Technical Report 13).


Tobias

PS: The routines are again reordered in OpenMP; the question is whether 
we want to follow suit or keep the current ordering. I only reordered 
the undocumented ones inside @menu and only those @menu that I modified.
commit 1a5734135d265a7b363ead9f821676a2a358969b
Author: Tobias Burnus 
Date:   Mon Aug 5 09:18:29 2024 +0200

libgomp.texi: Add OpenMP TR13 routines to @menu (commented out)

To keep track of missing routine documentation (both implemented and not),
the libgomp.texi file contains all non-OMPT routines as commented items
in @menu. This commit adds the routines added in TR13 as commented fixme
items.

libgomp/ChangeLog:

* libgomp.texi (OpenMP Runtime Library Routines): Add TR13 routines
to @menu (commented out).
---
 libgomp/libgomp.texi | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 07cd75124b0..c6759dd03bc 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1591,12 +1591,18 @@ They have C linkage and do not throw exceptions.
 @menu
 * omp_get_num_procs::   Number of processors online
 @c * omp_get_max_progress_width:: /TR11
+@c * omp_get_device_from_uid::  /TR13
+@c * omp_get_uid_from_device::  /TR13
 * omp_set_default_device::  Set the default device for target regions
 * omp_get_default_device::  Get the default device for target regions
 * omp_get_num_devices:: Number of target devices
 * omp_get_device_num::  Get device that current thread is running on
 * omp_is_initial_device::   Whether executing on the host device
 * omp_get_initial_device::  Device number of host device
+@c * omp_get_device_num_teams::  /TR13
+@c * omp_set_device_num_teams::  /TR13
+@c * omp_get_device_teams_thread_limit::  /TR13
+@c * omp_set_device_teams_thread_limit::  /TR13
 @end menu
 
 
@@ -2813,8 +2819,27 @@ Routines to manage and allocate memory on the current device.
 They have C linkage and do not throw exceptions.
 
 @menu
+@c * omp_get_devices_memspace:: /TR13
+@c * omp_get_device_memspace:: /TR13
+@c * omp_get_devices_and_host_memspace:: /TR13
+@c * omp_get_device_and_host_memspace:: /TR13
+@c * omp_get_devices_all_memspace:: /TR13
+@c * omp_get_memspace_num_resources:: /TR11
+@c * omp_get_memspace_pagesize:: /TR13
+@c * omp_get_submemspace:: /TR11
+@c * omp_init_mempartitioner:: /TR13
+@c * omp_destroy_mempartitioner:: /TR13
+@c * omp_init_mempartition:: /TR13
+@c * omp_destroy_mempartition:: /TR13
+@c * omp_mempartition_set_part:: /TR13
+@c * omp_mempartition_get_user_data:: /TR13
 * omp_init_allocator:: Create an allocator
 * omp_destroy_allocator:: Destroy an allocator
+@c * omp_get_devices_allocator:: /TR13
+@c * omp_get_device_allocator:: /TR13
+@c * omp_get_devices_and_host_allocator:: /TR13
+@c * omp_get_device_and_host_allocator:: /TR13
+@c * omp_get_devices_all_allocator:: /TR13
 * omp_set_default_allocator:: Set the default allocator
 * omp_get_default_allocator:: Get the default allocator
 * omp_alloc:: Memory allocation with an allocator
@@ -2823,8 +2848,6 @@ They have C linkage and do not throw exceptions.
 * omp_calloc:: Allocate nullified memory with an allocator
 * omp_aligned_calloc:: Allocate nullified aligned memory with an allocator
 * omp_realloc:: Reallocate memory allocated with OpenMP routines
-@c * omp_get_memspace_num_resources:: /TR11
-@c * omp_get_submemspace:: /TR11
 @end menu

Re: [PATCH] fortran: Fix a pasto in gfc_check_dependency

2024-08-02 Thread Tobias Burnus


[static analyzer]
Jakub Jelinek wrote:

[…] it is some proprietary static analyzer


I want to point out that a under utilized static analyzer keeps scanning 
GCC: Coverity Scan.


If someone has the time, I think it would be worthwhile to have a look 
at the reports. There are a bunch of persons having access to it – and 
more can be added (I think I can grant access). Thus, is someone of the 
GCC developers has interest + time …


Tobias

[wwwdocs] OpenMP: gcc-15/changes.html - minor update / projects/gomp - link to TR13

2024-08-01 Thread Tobias Burnus

First, OpenMP TR13 has just been released. Hence, link it from our 
project page.


For the GCC 15 page, I suggest to add ompx_gnu_pinned_mem_alloc (but one 
can argue about that) and the nvptx I/O support. We could also talk 
about nvptx + constructor support here.


Comments, thoughts, remarks before I commit it?

Current pages are: https://gcc.gnu.org/gcc-15/changes.html +
https://gcc.gnu.org/projects/gomp/#omp-status

Tobias
OpenMP: gcc-15/changes.html - minor update / projects/gomp - link to TR13

* htdocs/gcc-15/changes.html (OpenMP): Mention ompx_gnu_pinned_mem_alloc
  and Fortran I/O support on nvptx with OpenMP offloading.
* htdocs/projects/gomp/index.html (OpenMP Releases and Status): Add TR13.

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index a1bb0ddf..2fd7aa90 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -50,6 +50,12 @@ a work-in-progress.
   see the offload-target specifics section in the
   https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
   >GNU Offloading and Multi Processing Runtime Library Manual.
+  GCC added ompx_gnu_pinned_mem_alloc as https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html";>predefined
+  allocator. On https://gcc.gnu.org/onlinedocs/libgomp/nvptx.html";>Nvidia
+  GPUs, writing to the terminal from OpenMP target regions (but not from
+  OpenACC compute regions) is now also supported in Fortran; in C/C++ and
+  on AMD GPUs this was already supported before with both OpenMP and OpenACC.
 
 
   OpenMP 5.1: The unroll and tile
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index d1765fc3..89f0b120 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -1326,7 +1326,11 @@ error.
 
 OpenMP Releases and Status
 
-November 9, 2023
+August 1, 2023
+https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf";>OpenMP
+Technical Report 13 (third preview for the OpenMP API Version 6.0) has been
+released.
+
 https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf";>OpenMP
 Technical Report 12 (second preview for the OpenMP API Version 6.0) has been
 released.

Re: [Patch] libgomp: Device load_image - minor num-funcs/vars check improvement

2024-08-01 Thread Tobias Burnus

I have sent the following page in February (Stage 4) and didn't want to 
commit it back then. But for Stage 1, it should be fine ... I like to 
commit it tomorrow, unless there are comments suggesting other.


Attached is the unchanged patch and I also added a "diff -w -U1" patch 
as that makes it easier to see the non-re-indent changes.


Tobias

On February 19, 2024, Tobias Burnus wrote:
When debugging a linker issue, leading to a mismatch in the number of 
host/device functions, I was surprised by seeing one additional entry. 
Well, it turned out to be due to the ICV variable.


This patch makes it more consistent. The "+1" is returned since 
r12-2769-g0bac793ed6bad2 (for the on-device omp_get_device_num), 
extended in r13-2545-g9f2fca56593a2b for a struct to support more ICV 
variables on the devices [to handle OMP_..._DEV environment variables].


As the value is returned unconditionally, it makes sense to use it 
both for the expected-value diagnostic and for the condition further 
below.


Comments, suggestions, remarks?

Tobias

PS: Alternative would be to make the plugin's value depend on whether 
the data was loaded. But that would make the number-of-entries assert 
weaker and might cause corner-case issues when a slightly older 
libgomp plugin is used with the updated libgomp.so. Thus, I have 
settled for the attached variant.diff --git a/libgomp/target.c b/libgomp/target.c
index efed6ad68ff..fb9a6fb5c79 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2364,5 +2364,4 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
-  if (num_target_entries != num_funcs + num_vars
-  /* "+1" due to the additional ICV struct.  */
-  && num_target_entries != num_funcs + num_vars + 1)
+  /* The "+1" is due to the additional ICV struct.  */
+  if (num_target_entries != num_funcs + num_vars + 1)
 {
@@ -2372,3 +2371,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   gomp_fatal ("Cannot map target functions or variables"
-		  " (expected %u, have %u)", num_funcs + num_vars,
+		  " (expected %u + %u + 1, have %u)", num_funcs, num_vars,
 		  num_target_entries);
@@ -2456,11 +2455,5 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
-  /* Last entry is for a ICVs variable.
- Tolerate case where plugin does not return those entries.  */
-  if (num_funcs + num_vars < num_target_entries)
-{
-  struct addr_pair *var = &target_table[num_funcs + num_vars];
-
-  /* Start address will be non-zero for the ICVs variable if
-	 the variable was found in this image.  */
-  if (var->start != 0)
+  /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
+  struct addr_pair *icv_var = &target_table[num_funcs + num_vars];
+  if (icv_var->start != 0)
 {
@@ -2471,3 +2464,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
-	  size_t var_size = var->end - var->start;
+  size_t var_size = icv_var->end - icv_var->start;
   if (var_size != sizeof (struct gomp_offload_icvs))
@@ -2482,3 +2475,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	 actually designating its device number into effect.  */
-	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
+  gomp_copy_host2dev (devicep, NULL, (void *) icv_var->start, icvs,
 			  var_size, false, NULL);
@@ -2489,3 +2482,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
-	  k->tgt_offset = var->start;
+  k->tgt_offset = icv_var->start;
   k->refcount = REFCOUNT_INFINITY;
@@ -2498,3 +2491,2 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 }
-}
 
libgomp: Device load_image - improve minor num-funcs/vars check

The run time library loads the offload functions and variable and optionally
the ICV variable and returns the number of loaded items, which has to match
the host side. The plugin returns "+1" (since GCC 12) for the ICV variable
entry, independently whether it was loaded or not, but the var's value
(start == end == 0) can be used to detect when this failed.

Thus, we can tighten the assert check - which this commit does together with
making the output less surprising - and simplify the condition further below.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): If ICV variable
	is is not available, decrement other_count and thus the return value.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Extend fatal-error message;
	simplify a condition.

 libgomp/target.c | 78 +

[Patch, v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])

2024-07-31 Thread Tobias Burnus


Hi Richard, hi all,

Richard Biener wrote:

Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
shouldn't simply do the substitution?  Aka


This seems to work ...


--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
&& is_global_var (t)
&& lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
  {
+  *tp = unshare_expr (DECL_VALUE_EXPR (t));
*walk_subtrees = 0;
return t;
  }

which then makes the stmt obviously not gimple?


... except that 'return t' prevents updating other value-expr in the 
same stmt, but that can be fixed.


Updated patch attached.

Thanks for the suggestion!

Tobias
omp-offload.cc: Fix value-expr handling of 'declare target link' vars

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol. Now expand the value expression
already in pass_omp_target_link::execute's process_link_var_op walk_gimple_stmt
walk - and don't rely on gimple_regimplify_operands.

PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_body): Fix macro name in the comment.
	* omp-offload.cc (found_link_var): New global var.
	(find_link_var_op): Rename to ...
	(process_link_var_op): ... this. Replace value expr; set
	found_link_var.
	(pass_omp_target_link::execute): Update walk_gimple_stmt call.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

Co-authored-by: Richard Biener  PR115637
-! if (res /= -11436) stop 5
-if (res /= -11546) stop 5 ! FIXME
+! print *, res
+if (res /= -11436) stop 5
   end
   integer function run_device1()
 !$omp declare target
 integer :: i
 run_device1 = -99
-! FIXME: arr2 not link mapped -> PR115637
-!   arr2 = [11,22,33,44]
+arr2 = [11,22,33,44]
 if (any (arr(10:50) /= [(i, i=10,50)])) then
   run_device1 = arr(11)
   return
 end if
-! FIXME: -> PR115637
-! run_device1 = sum(arr(10:13) + arr2)
-run_device1 = sum(arr(10:13) ) ! FIXME
+run_device1 = sum(arr(10:13) + arr2)
 do i = 10, 50
   arr(i) = 3 - 10 * arr(i)
 end do

[PATCH, v2] OpenMP: Constructors and destructors for "declare target" static aggregates

2024-07-30 Thread Tobias Burnus


Hello world, hi Jakub,

I would like to PING the following patch.
It's essentially Julian's patch, except:

* It is rediffed (albeit it mostly applied cleanly).
* I replaced the omp_is_initial_device call by an
  internal function (IFN_) such that it can be evaluated
  at compile time. With -O1, this also optimizes the host
  function away as it should :-)
* Regarding nvptx: constructors are supported since GCC 15.
  Thus, the three testcases now work under nvptx as well.
  (Two fail on nvptx when compiled with neither optimization nor
   -foffload-options=nvptx-none=-malias as the constructor
   uses aliases, which aren't supported, yet.)

Comments, remarks, suggestions?
OK for mainline?

Tobias

On May 12, 2023, Julian Brown wrote:> This patch adds support for 
running constructors and destructors for

static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.

The approach taken is to generate a set of constructors to run on the
target: this currently works for AMD GCN, but fails on NVPTX due
to lack of constructor/destructor support there so far on mainline.
(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

This patch was previously posted for the og12 branch here:

   https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614710.html
   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615013.html
   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615144.html

though needed a fair amount of rework for mainline due to Nathan's
(earlier!) patch:

   https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596402.html

Tested with offloading to AMD GCN and bootstrapped. OK for mainline?

Thanks,

Julian
OpenMP: Constructors and destructors for "declare target" static aggregates

This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-07-30  Julian Brown  
	Tobias Burnus  

gcc/cp/
	* decl2.cc (tree-inline.h): Include.
	(static_init_fini_fns): Bump to four entries. Update comment.
	(start_objects, start_partial_init_fini_fn): Add 'omp_target'
	parameter. Support "declare target" decls. Update forward declaration.
	(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
	the created function. Support "declare target".
	(OMP_SSDF_IDENTIFIER): New macro.
	(partition_vars_for_init_fini): Support partitioning "declare target"
	variables also.
	(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
	"declare target" decls.
	(c_parse_final_cleanups): Support constructors/destructors on OpenMP
	offload targets.

gcc/
	* gimplify.cc (gimplify_call_expr): Set calls_declare_variant_alt
	for IFN_GOMP_IS_INITIAL_DEVICE.
	* internal-fn.cc (expand_GOMP_IS_INITIAL_DEVICE): New.
	* internal-fn.def (IFN_GOMP_IS_INITIAL_DEVICE): Add.
	* omp-offload.cc (execute_omp_device_lower): Expand it.
	* tree.cc (get_file_function_name): Support names for on-target
	constructor/destructor functions.

libgomp/
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
	test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
	test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New
	test.

Co-authored-by: Tobias Burnus 

 gcc/cp/decl2.cc| 240 +
 gcc/gimplify.cc|   8 +-
 gcc/internal-fn.cc |   8 +
 gcc/internal-fn.def|   1 +
 gcc/omp-offload.cc |   7 +
 gcc/tree.cc|   6 +-
 .../static-aggr-constructor-destructor-1.C |  28 +++
 .../static-aggr-constructor-destructor-2.C |  31 +++
 .../static-aggr-constructor-destructor-3.C |  36 
 9 files changed, 324 insertions(+), 41 deletions(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6d674684931..21ac65452e6 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "optabs-query.h&

Re: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

2024-07-30 Thread Tobias Burnus


Richard Biener wrote:

On Mon, Jul 29, 2024 at 9:26 PM Tobias Burnus  wrote:

Inside pass_omp_target_link::execute, there is a call to
gimple_regimplify_operands but the value expression is not
expanded.[...]

Where is_gimple_mem_ref_addr is defined as:

/* Return true if T is a valid address operand of a MEM_REF.  */

bool
is_gimple_mem_ref_addr (tree t)
{
return (is_gimple_reg (t)
|| TREE_CODE (t) == INTEGER_CST
|| (TREE_CODE (t) == ADDR_EXPR
&& (CONSTANT_CLASS_P (TREE_OPERAND (t, 0))
|| decl_address_invariant_p (TREE_OPERAND (t, 0);
}

I think iff then decl_address_invariant_p should be amended.


This does not work - at least not for my use case if OpenMP
link variables - due to ordering issues.

For the device compilers, the VALUE_EXPR is added in lto_main
or in do_whole_program_analysis (same file: lto/lto.cc) by
callingoffload_handle_link_vars. The value expression is then later expanded 
via pass_omp_target_link::execute, but in between the following happens:

lto_main  callssymbol_table::compile, which then calls
cgraph_node::expand  and that executes

   res |= verify_types_in_gimple_reference (lhs, true); for lhs being: 
MEM  [(c_char * {ref-all})&arr2]
But when adding the has-value-expr check either directly to is_gimple_mem_ref_addr or to the decl_address_invariant_pit calls, the following condition becomes true the called function in 
tree-cfg.cc:


3302  if (!is_gimple_mem_ref_addr (TREE_OPERAND (expr, 0))
3303  || (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR
3304  && verify_address (TREE_OPERAND (expr, 0), false)))
3305{
3306  error ("invalid address operand in %qs", code_name);

* * * Thus, I am now back to the previous change, except for:


Why is the gimplify_addr_expr hunk needed?  It should get
to gimplifying the VAR_DECL/PARM_DECL by recursion?


Indeed. I wonder why I had (thought to) need it before; possibly
because it was needed or thought to be needed when trying to trace
this down.

Previous patch - except for that bit removed - attached.

Thoughts, better ideas?

Tobias
gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol.

	PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_expr): For MEM_REF and an ADDR_EXPR, also
	check for value-expr arguments.
	(gimplify_body): Fix macro name in the comment.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

 gcc/gimplify.cc   |  9 +++--
 libgomp/testsuite/libgomp.fortran/declare-target-link.f90 | 15 ++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ab323d764e8..4fa88c9b21c 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -18251,8 +18251,13 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	 in suitable form.  Re-gimplifying would mark the address
 	 operand addressable.  Always gimplify when not in SSA form
 	 as we still may have to gimplify decls with value-exprs.  */
+	  tmp = TREE_OPERAND (*expr_p, 0);
 	  if (!gimplify_ctxp || !gimple_in_ssa_p (cfun)
-	  || !is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0)))
+	  || (!is_gimple_mem_ref_addr (tmp)
+		  || (TREE_CODE (tmp) == ADDR_EXPR
+		  && (VAR_P (TREE_OPERAND (tmp, 0))
+			  || TREE_CODE (TREE_OPERAND (tmp, 0)) == PARM_DECL)
+		  && DECL_HAS_VALUE_EXPR_P (TREE_OPERAND (tmp, 0)
 	{
 	  ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
    is_gimple_mem_ref_addr, fb_rvalue);
@@ -19422,7 +19427,7 @@ gimplify_body (tree fndecl, bool do_parms)
   DECL_SAVED_TREE (fndecl) = NULL_TREE;
 
   /* If we had callee-copies statements, insert them at the beginning
- of the function and clear DECL_VALUE_EXPR_P on the parameters.  */
+ of the function and clear DECL_HAS_VALUE_EXPR_P on the parameters.  */
   if (!gimple_seq_empty_p (parm_stmts))
 {
   tree parm;
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
index 2ce212d114f..44c67f925bd 100644
--- a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
@@ -1,5 +1,7 @@
 ! { dg-additional-options "-Wall" }
+
 ! PR fortran/115559
+! PR middle-end/115637
 
 module m
integer :: A
@@ -73,24 +75,19 @@ contains
 !$omp target map(from:res)
   res = run_device1()
 !$omp end target
-print *, res
-! FIXME: arr2 not link mapped -> PR115637
-! if (res /= -1

Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-30 Thread Tobias Burnus


Prathamesh Kulkarni wrote:

Thanks for your suggestions on RFC email, the attached patch adds support for 
streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.


First, thanks a lot for your patch!

Secondly, it seems as if this patch is indented to fully or partially 
fix the following PRs.

If so, can you add the PR to the commit log such that both "git log"
will help finding the problem report and the commit will show up
in the issue?


https://gcc.gnu.org/PR111937
  PR ipa/111937
  offloading from x86_64-linux-gnu to riscv*-linux-gnu will have issues

https://gcc.gnu.org/PR96265
  PR ipa/96265
  offloading to nvptx-none from aarch64-linux-gnu (and 
riscv*-linux-gnu) does not work


And - marked as duplicate of the latter:

https://gcc.gnu.org/PR114174
  PR lto/114174
  [aarch64] Offloading to nvptx-none

Thanks,

Tobias

[committed] gfortran.dg/compiler-directive_2.f: Update dg-error (was: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR115559])

2024-07-30 Thread Tobias Burnus


Follow up fix:

As the !GCC$ attributes are now added in reverse order,
the 'stdcall' vs. 'fastcall' in the error message swapped order:
 "Error: stdcall and fastcall attributes are not compatible" This didn't 
show up here with -m64 ("Warning: 'stdcall' attribute ignored") and I 
didn't run it with -m32, but it was reported by Haochen's script +

manually confirmed by him.
(Thanks for the report and checking – and sorry for the FAIL.)

Committed asr15-2401-g15158a8853a69f. Tobias

[Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

2024-07-29 Thread Tobias Burnus


The problem is code like:

  MEM  [(c_char * {ref-all})&arr2]

where arr2 is the value expr '*arr2$13$linkptr'
(i.e. indirect ref + decl name).

Insidepass_omp_target_link::execute, there is a call to 
gimple_regimplify_operands but the value expression is not 
expanded.There are two problems: ADDR_EXPR is no handling this and while 
MEM_REF has some code for it, it doesn't handle this either. The 
attached code fixes this. Tested on x86_64-gnu-linux with nvidia 
offloading. Comments, remarks, OK? Better suggestions? * * * In 
gimplify_expr for MEM_REF, there is a call to is_gimple_mem_ref_addr which checks for ADD_EXPR

but not for value expressions. The attached match handles
the case explicitly, but, alternatively, we might want
move it to is_gimple_mem_ref_addr (not checked whether it
makes sense or not).

Where is_gimple_mem_ref_addr is defined as:

/* Return true if T is a valid address operand of a MEM_REF.  */

bool
is_gimple_mem_ref_addr (tree t)
{
  return (is_gimple_reg (t)
  || TREE_CODE (t) == INTEGER_CST
  || (TREE_CODE (t) == ADDR_EXPR
  && (CONSTANT_CLASS_P (TREE_OPERAND (t, 0))
  || decl_address_invariant_p (TREE_OPERAND (t, 0);
}

Tobias
gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol.

	PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_addr_expr): Handle value-expr arg.
	(gimplify_expr): For MEM_REF and an ADDR_EXPR, also check
	for value-expr arguments.
	(gimplify_body): Fix macro name in the comment.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

 gcc/gimplify.cc  | 16 ++--
 .../testsuite/libgomp.fortran/declare-target-link.f90| 15 ++-
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ab323d764e8..d548dc2cdf6 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -6888,6 +6888,13 @@ gimplify_addr_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
   enum gimplify_status ret;
   location_t loc = EXPR_LOCATION (*expr_p);
 
+  if (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
+{
+  ret = gimplify_var_or_parm_decl (&TREE_OPERAND (expr, 0));
+  if (ret == GS_ERROR)
+	return ret;
+  op0 = TREE_OPERAND (expr, 0);
+}
   switch (TREE_CODE (op0))
 {
 case INDIRECT_REF:
@@ -18251,8 +18258,13 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	 in suitable form.  Re-gimplifying would mark the address
 	 operand addressable.  Always gimplify when not in SSA form
 	 as we still may have to gimplify decls with value-exprs.  */
+	  tmp = TREE_OPERAND (*expr_p, 0);
 	  if (!gimplify_ctxp || !gimple_in_ssa_p (cfun)
-	  || !is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0)))
+	  || (!is_gimple_mem_ref_addr (tmp)
+		  || (TREE_CODE (tmp) == ADDR_EXPR
+		  && (VAR_P (TREE_OPERAND (tmp, 0))
+			  || TREE_CODE (TREE_OPERAND (tmp, 0)) == PARM_DECL)
+		  && DECL_HAS_VALUE_EXPR_P (TREE_OPERAND (tmp, 0)
 	{
 	  ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
    is_gimple_mem_ref_addr, fb_rvalue);
@@ -19422,7 +19434,7 @@ gimplify_body (tree fndecl, bool do_parms)
   DECL_SAVED_TREE (fndecl) = NULL_TREE;
 
   /* If we had callee-copies statements, insert them at the beginning
- of the function and clear DECL_VALUE_EXPR_P on the parameters.  */
+ of the function and clear DECL_HAS_VALUE_EXPR_P on the parameters.  */
   if (!gimple_seq_empty_p (parm_stmts))
 {
   tree parm;
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
index 2ce212d114f..44c67f925bd 100644
--- a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90
@@ -1,5 +1,7 @@
 ! { dg-additional-options "-Wall" }
+
 ! PR fortran/115559
+! PR middle-end/115637
 
 module m
integer :: A
@@ -73,24 +75,19 @@ contains
 !$omp target map(from:res)
   res = run_device1()
 !$omp end target
-print *, res
-! FIXME: arr2 not link mapped -> PR115637
-! if (res /= -11436) stop 5
-if (res /= -11546) stop 5 ! FIXME
+! print *, res
+if (res /= -11436) stop 5
   end
   integer function run_device1()
 !$omp declare target
 integer :: i
 run_device1 = -99
-! FIXME: arr2 not link mapped -> PR115637
-!   arr2 = [11,22,33,44]
+arr2 = [11,22,33,44]
 if (any (arr(10:50) /= [(i, i=10,50)])) then
   run_device1 = arr(11)
   return
 end if
-! FIXME: -> PR115637
-! run_device1 = sum(arr(10:13) + arr2)
-run_device1 = sum(arr(10:13) ) ! FIXME
+run_device1 = sum(arr

[Patch] libgomp.texi: Update 'Device Information Routines' section

2024-07-29 Thread Tobias Burnus


I recently stumbled over omp_get_default_device returning -1 (= 
omp_initial_device)
vs. returning omp_get_num_devices(). Thus, it makes sense to document this 
properly.
I also updated some wording and made a tiny step to documenting the missing 
functions
by adding a title to the commented @menu items.

→ https://gcc.gnu.org/onlinedocs/libgomp/#toc-OpenMP-Runtime-Library-Routines
for the current wording.

Comments or suggestions before I commit it?

Tobias
libgomp.texi: Update 'Device Information Routines' section

Update 'OpenMP Runtime Library Routines' by adding a note that invoking
inside a target region might invoke unspecified behavior. Additionally,
update omp_{get,set}_default_device for omp_{initial,invalid}_device
named constants.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Runtime Library Routines): Add missing
	title to some commented still undocumented items.
	(Device Information Routines): Update.

 libgomp/libgomp.texi | 48 +---
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 50da248b74d..8fe74d58562 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1208,11 +1208,11 @@ They have C linkage and do not throw exceptions.
 
 @menu
 * omp_get_proc_bind::   Whether threads may be moved between CPUs
-@c * omp_get_num_places:: 
-@c * omp_get_place_num_procs:: 
-@c * omp_get_place_proc_ids:: 
-@c * omp_get_place_num:: 
-@c * omp_get_partition_num_places:: 
+@c * omp_get_num_places::   Get the number of places available
+@c * omp_get_place_num_procs::  Get the number of processes associated with a place
+@c * omp_get_place_proc_ids::   Get number of processes associated with a place
+@c * omp_get_place_num::Get place number of the associated task
+@c * omp_get_partition_num_places:: Get number of places of innermost task
 @c * omp_get_partition_place_nums:: 
 @c * omp_set_affinity_format:: 
 @c * omp_get_affinity_format:: 
@@ -1627,8 +1627,12 @@ Returns the number of processors online on that device.
 @subsection @code{omp_set_default_device} -- Set the default device for target regions
 @table @asis
 @item @emph{Description}:
-Set the default device for target regions without device clause.  The argument
-shall be a nonnegative device number.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause.  The argument
+shall be a nonnegative device number, @code{omp_initial_device},
+or @code{omp_invalid_device}.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1654,7 +1658,15 @@ shall be a nonnegative device number.
 @subsection @code{omp_get_default_device} -- Get the default device for target regions
 @table @asis
 @item @emph{Description}:
-Get the default device for target regions without device clause.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause. The value is either a
+nonnegative device number, @code{omp_initial_device} or
+@code{omp_invalid_device}. Note that for the host, the ICV can have two values
+and, hence, this routine might return either the value of the named constant
+@code{omp_initial_device} or the value returned by the
+@code{omp_get_initial_device} routine.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1667,7 +1679,8 @@ Get the default device for target regions without device clause.
 @end multitable
 
 @item @emph{See also}:
-@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
+@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device},
+@ref{omp_get_initial_device}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
@@ -1681,6 +1694,8 @@ Get the default device for target regions without device clause.
 @item @emph{Description}:
 Returns the number of target devices.
 
+The effect of running this routine in a @code{target} region is unspecified.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
@@ -1702,9 +1717,9 @@ Returns the number of target devices.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that represents the device that the
-current thread is executing on. For OpenMP 5.0, this must be equal to the
-value returned by the @code{omp_get_initial_device} function when called
-from the host.
+current thread is executing on. When called on the host, it returns
+the same value as returned by the @code{omp_get_initial_device} function
+as required since OpenMP 5.0.
 
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
@@ -1754,9 +1769,11 @@ their language-specific counterparts.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that rep

Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus


Hi Andre, hi all,

Andre Vehreschild wrote:

yes, I could have looked harder 🙂


I wrote ;-) on purpose as this feature is somewhat hidden and writing 
'dg-do compile' doesn't harm.


In case of gcc/testsuite, the 'run' is also needed and were often missed 
(or rather caused by invalid variants such as 'dg-run' (should be: 
'dg-do run') or '{dg-do run }' (missing space after '{') prevented the 
running of the code). Sam did fix some of those (and some other dg-* 
issues) recently, e.g. in r15-2349-ga75c6295252d0d (→ 
https://gcc.gnu.org/r15-2349-ga75c6295252d0d ).



This isn't by any chance documented on the developer website of gcc somewhere?
It would be sad, if that knowledge is not publicy available for the future.


https://gcc.gnu.org/onlinedocs/gccint/Directives.html#Specify-how-to-build-the-test 
documents it.


And libgomp has: lib/libgomp.exp:set dg-do-what-default run

The all arguments vs. only -O2 is set in libgomp via:

libgomp.c++/c++.exp:    set DEFAULT_CFLAGS "-O2"

libgomp.c/c.exp:    set DEFAULT_CFLAGS "-O2"

and for libgomp.*fortran/fortran.exp, the difference between 'dg-do run' 
vs. default is *not* *documented,* but seems to be the result of the 
following:


# For Fortran we're doing torture testing, as Fortran has far more tests
# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
# typically not the case for C/C++.
gfortran-dg-runtest $tests "" ""


Tobias

Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus


Hi Andre,

Andre Vehreschild wrote:

I am wondering why the testcase has no `!{ dg-do ... }` line. What will dejagnu
do then? Sorry for the may be stupid question, but I never encountered a
testcase without a dg-do line. It was the minimum for me.


Well, then you need look harder ;-)

In gcc/testsuite/, the default is '{ dg-do compile }', i.e. you can
specify or leave out that line without any additional effect. Having it
might be a tad clearer, albeit makes the test a tad longer.

But if you want to 'run' or 'link', you need to specify the dg-do line.
There are several files which don't have the "dg-do compile" line, also
under gcc/testsuite/gfortran.dg

In case of libgomp, it is becomes interesting: the default is running
the code, i.e. you need a 'compile' or 'link' when it shouldn't be run.

However, at least for Fortran (libgomp.{oacc-}fortran), there is a
difference between specifying nothing and specifying 'dg-do run': In
case of the default, it is compiled and run. But if you specify 'dg-do
run', it is compiled multiple times with different optimization options
and then run.

(Actually, also under gcc/testsuite/gfortran.dg, you get multiple
compilations + runs with 'dg-do run'. If you use dg-additional-options,
you can also add options. I think with dg-options, you set it to a
single run [not confirmed].)

The downside of compiling + running it multiple times is a longer test
time without any real benefit. However, especially with Fortran,
compiling with different optimization levels did expose issues in the
past, both in the Fortran front end and in the middle end. — Thus, there
some benefit of using it.

In any case, there more complex the code is that front-end + middle-end
code have to process, the more useful is "dg-do run". The more work is
done by the run-time library, be it libgfortran or libgomp, the less
useful it becomes as the heavy lifting is done in the run-time library.
— As libgomp progressing already takes quite some time (albeit it can
now run in parallel), there are some who prefer few 'dg-do run' and
others who prefer if all Fortran testcases there use 'dg-do run' …

I hope it helps,

Tobias

[Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-26 Thread Tobias Burnus


Updated patch - only change is to the testcase:

* With the just posted patch for PR116107, array sections with offset 
work for 'link', hence, I updated the testcase.


* For 'arr2', I added ref to the associated PR.

I intent to commit it once PR116107 has been committed.

Tobias

Tobias Burnus wrote:

Hi all,

it turned out that 'declare target' with 'link' clause was broken in multiple 
ways.

The main fix is the attached patch, i.e. namely pushing the variables already to
the offload-vars list already in the FE.

When implementing it, I noticed:
* C has a similar issue when using nested functions, which is
   a GNU extension →https://gcc.gnu.org/115574

* When doing partial mapping of arrays (which is one of the reasons for 'link'),
   offsets are mishandled in Fortran (not tested in C), see FIXME in the patch)
   There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19.
   (I will file a PR about this).

* It might happen that linked variables do not get linked. I have not 
investigated
   why, but 'arr2' gives link errors – while 'arr' works.
   See FIXME in the patch. (I will file a PR about this)

* For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577

* When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c',
   it fails to link the device side as the 'mycom_' symbol cannot be found on 
the
   device side.  (I will file a PR about this)

As COMMON as issues, an alternative would be to defer the trans-common.cc
changes to a later patch.

Comments, questions, concerns?

Tobias

PS: Tested with nvptx offloading with a page-migration supporting system with
nvptx and GCN offloading configured and no new fails observed.OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

	PR fortran/115559

gcc/fortran/ChangeLog:

	* trans-common.cc (build_common_decl): Add 'omp declare target' and
	'omp declare target link' variables to offload_vars.
	* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
	call decl_attributes.
	(get_proc_pointer_decl, gfc_get_extern_function_decl,
	build_function_decl): Update calls.
	(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
	to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: New test.

 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 116 +
 3 files changed, 192 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663..e714342c3c0 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
 	  = tree_cons (get_identifier ("omp declare target"),
 		   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+	{
+	  /* Add to offload_vars; get_create does so for omp_declare_target,
+	 omp_declare_target_link requires manual work.  */
+	  gcc_assert (symtab_node::get (decl) == 0);
+	  symtab_node *node = symtab_node::get_create (decl);
+	  if (node != NULL && com->omp_declare_target_link)
+	{
+	  node->offloadable = 1;
+	  if (ENABLE_OFFLOADING)
+		{
+		  g->have_offload = true;
+		  if (is_a  (node))
+		vec_safe_push (offload_vars, decl);
+		}
+	}
+	}
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 82fa2bb6134..0fdc41b1784 100644
--- a/gcc/fortran/trans-decl.cc
+

[Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-07-26 Thread Tobias Burnus


The main idea of 'link' is to permit putting only a subset of a
huge array on the device. Well, in order to make this work properly,
it requires that one can map an array section, which does not
start with the first element.

This patch adjusts the pointers such, that this actually works.

(Tested on x86-64-gnu-linux with Nvptx offloading.)
Comments, suggestions, remarks before I commit it?

Tobias
libgomp: Fix declare target link with offset array-section mapping [PR116107]

Assume that 'int var[100]' is 'omp declare target link(var)'. When now
mapping an array section with offset such as 'map(to:var[20:10])',
the device-side link pointer has to store &[0] minus
the offset such that var[20] will access [0]. But
the offset calculation was missed such that the device-side 'var' pointed
to the first element of the mapped data - and var[20] points beyond at
some invalid memory.

	PR middle-end/116107

libgomp/ChangeLog:

	* target.c (gomp_map_vars_internal): Honor array mapping offsets
	with declare-target 'link' variables.
	* testsuite/libgomp.c-c++-common/target-link-2.c: New test.

 libgomp/target.c   |  7 ++-
 .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 ++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index aa01c1367b9..e3e648f5443 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		if (k->aux && k->aux->link_key)
 		  {
 		/* Set link pointer on target to the device address of the
-		   mapped object.  */
-		void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+		   mapped object. Also deal with offsets due to
+		   array-section mapping. */
+		void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
+	   - (k->host_start
+		  - k->aux->link_key->host_start));
 		/* We intentionally do not use coalescing here, as it's not
 		   data allocated by the current call to this function.  */
 		gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
new file mode 100644
index 000..4ff4080da76
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -0,0 +1,59 @@
+/* PR middle-end/116107  */
+
+#include 
+
+int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+#pragma omp declare target link(arr)
+
+#pragma omp begin declare target
+void f(int *res)
+{
+  __builtin_memcpy (res, &arr[5], sizeof(int)*10);
+}
+
+void g(int *res)
+{
+  __builtin_memcpy (res, &arr[3], sizeof(int)*10);
+}
+#pragma omp end declare target
+
+int main()
+{
+  int res[10], res2;
+  for (int dev = 0; dev < omp_get_num_devices(); dev++)
+{
+  __builtin_memset (res, 0, sizeof (res));
+  res2 = 99;
+
+  #pragma omp target enter data map(arr[5:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+	f (res);
+
+  #pragma omp target map(from: res2) device(dev)
+	res2 = arr[5];
+
+  if (res2 != 6)
+	__builtin_abort ();
+  for (int i = 0; i < 10; i++)
+	if (res[i] != 6 + i)
+	  __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[5:10]) device(dev)
+
+  for (int i = 0; i < 15; i++)
+	res[i] *= 10;
+	  __builtin_abort ();
+
+  #pragma omp target enter data map(arr[3:10]) device(dev)
+  __builtin_memset (res, 0, sizeof (res));
+
+  #pragma omp target map(from: res) device(dev)
+	g (res);
+
+  for (int i = 0; i < 10; i++)
+	if (res[i] != (4 + i)*10)
+	  __builtin_abort ();
+}
+  return 0;
+}

Re: [PATCH v3 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-25 Thread Tobias Burnus


Hi Sandra,

thanks for your patch. (Disclaimer: I have not finished reading through 
your patch.)


Some upfront generic remarks:

[* When first compiling it (incremental build), I did run into the issue 
that OMP_METADIRECTIVE_CHECK wasn't declared. Thus, there seems to be a 
dependency issue causing that tree-check.h might generated after code 
that includes tree.h is processed. (Unrelated to your patch itself, but 
for completeness …)]


* Not required right now, but eventually we need to check whether 
https://gcc.gnu.org/PR112779 is fully fixed by this patch set or whether 
follow-up work is required (and if so which). There is also PR107067 for 
a Fortran ICE.


* There are some not-implemented/FIXME comments in the patches for 
missing features. I think we should ensure that those won't get 
forgotten, e.g. by filing PRs for those. – For declare variant, some PRs 
might already exist.


Can you eventually take care of the last two items?

(For the last item: e.g. 'target_device' for declare_variant, for which 
'sorry' already existed.)


* * *

I might have asked the following question before – and you might have 
answered it already:


Sandra Loosemore wrote:


This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.


I have to admit that I do not understand the part:


+  else if (set == OMP_TRAIT_SET_TARGET_DEVICE)
+/* The target_device set is dynamic, so treat it as always
+   resolvable.  */
+continue;
+


The current code has 3 states:

* 0 - if a trait is false; this directly returns as it cannot be fixed later

* 1 - if the all traits are known to match (initial value)

* -1 - if one trait cannot be evaluated, either because it is too early 
(e.g. during parsing) or because it is a dynamic context selector.


Thus, I had expected:

(a) ret = -1 as default in this case (not known)

(b) for cases where it is known, a 'return 0' / not-setting -1. In 
particular:


* n == const → device_num(n) – false if '< -1' and, for 
'!ENABLE_OFFLOADING || offload_targets == NULL' either false for n > 0 
or otherwise false.


* Checks similar to OMP_TRAIT_DEVICE_{KIND,ARCH,ISA}, i.e. kind(any) → 
true, kind(fpga) → false, arch(something_unknown) → false if not true 
for any device. With '!ENABLE_OFFLOADING || offload_targets == NULL', 
the kind_arch_isa check can be done as for the host.


* * *

Have I missed something and is it sensible to return 1 instead of -1 here?

* * *



@@ -1804,6 +1834,12 @@ omp_context_selector_matches (tree ctx)


   case OMP_TRAIT_USER_CONDITION:
 if (set == OMP_TRAIT_SET_USER)

for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
  if (OMP_TP_NAME (p) == NULL_TREE)
{
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+

 if (integer_zerop (OMP_TP_VALUE (p)))
   return 0;
 if (integer_nonzerop (OMP_TP_VALUE (p)))
   break;
 ret = -1;
   }



* Comment wording: Please change to imply >= 5.1 not == 5.0 * Comment: I 
don't see why the non-const only applies to metadirectives; the OpenMP 
>= 5.1 seems to imply that it is also valid for declare variant. Thus, 
I would change the wording. * The current code seems to already handle 
non-const values as expected. ... except that it changes "res" to -1, 
while the idea seems to be not to modify 'ret' in this case for 
metadirectives. (Why? Same question as above).

* * *

Quotes from the specifications regarding the expressions:

The current spec has:

"Restrictions to context selectors are as follows:" …

"A variable or procedure that is referenced in an expression that 
appears in a context selector
must be visible at the location of the directive on which the context 
selector appears unless
the directive is a declare_variant directive and the variable is an 
argument of the

associated base function."

5.1 wording is the following (approx. same except for argument bit):

"All variables that are referenced in an expression that appears in
the context selector of a match clause must be accessible at a call site 
to the base function

according to the base language rules."

5.0 had (e.g. for C): "The condition(boolean-expr) selector defines a 
constant expression that must evaluate to true for the selector to be true."


* * *


+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+
  if (integer_zerop (OMP_TP_VA

[Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Tobias Burnus


Hi Andrew, hi all,

to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I 
suggest the attach patch that also suggest Thomas' Newlib commit (April 
4, 2024)


ed50a50b9   amdgcn: Implement proper locks: Fix 
'newlib/libc/sys/amdgcn/include/sys/lock.h' for C++


and not only your commit (March 25, 2024)

7dd4eb1db amdgcn: Implement proper locks

Comments or suggestions before I commit it?

Tobias
install.texi (gcn): Suggest newer commit for Newlib

Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console
output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h
compilable with C++. This commit mentiones now also the second commit.

gcc/ChangeLog:

	* doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version
	for newlib.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index b5456992583..dda623f4410 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3952,9 +3952,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
-GFX11 devices).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits
+7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output
+for GFX10 and GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use

Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args

2024-07-22 Thread Tobias Burnus


Hi PA,

as discussed off list, I was stumbling over the call to GOMP_task. I now 
understand why: I was looking at a different version of the OpenMP spec.


Namely, OpenMP 5.2 contains the changes for spec Issue 2741 "dispatch 
construct data scoping issues". Namely: Performance issue due to 'task' 
compared to direct call, effect of unintended firstprivatization, …


The currrent version has

(a) nowait

"The addition of the *nowait* element to the semantic requirement set by 
the *dispatch* directive has no effect on the dispatch construct apart 
from the effect it may have on the arguments that are passed when 
calling a function variant." (I assume the latter is about 'append_args' 
of interop objects)


(b) depend

"If the *dispatch* directive adds one or more _depend_ element to the 
semantic requirement set, and those element are not removed by the 
effect of a declare variant directive, the behavior is as if those 
properties were applied as *depend* clauses to a *taskwait* construct 
that is executed before the *dispatch* region is executed."


I think it would good to match the 5.2 behavior.

* * *

I have not fully checked whether the 'device' routine is properly 
handled. The current wording states:


"If the device clause is present, the value of the default-device-var 
ICV is set to the value of the expression in the clause on entry to the 
dispatch region and is restored to its previous value at the end of the 
region."


For the code itself, it seems to be handled correctly, see attached 
testcase (consider including).


I was wondering (and haven't checked) whether the ICV is set for too 
much (i.e. not only the "data environment" (i.e.
"The variables associated with the execution of a given region"), but is 
also imminently visible by other concurrently running threads outside of 
that region).


Can you check. (Albeit, my question might also be answered once I finish 
reading the patch …)


Thanks,

Tobias
#include 

int f ()
{
  return omp_get_default_device ();
}

int main ()
{
  for (int d = omp_initial_device; d <= omp_get_num_devices (); d++)
{
  int dev = omp_invalid_device;
  omp_set_default_device (d);

  #pragma omp dispatch
	dev = f ();

  if (d == omp_initial_device || d == omp_get_num_devices ())
	{
	  if (dev != omp_initial_device && dev != omp_get_num_devices ())
	__builtin_abort ();
	  if (omp_get_default_device() != omp_initial_device
	  && omp_get_default_device() != omp_get_num_devices ())
	__builtin_abort ();
	}
  else
	if (dev != d || d != omp_get_default_device())
	  __builtin_abort ();

  for (int d2 = omp_initial_device; d2 <= omp_get_num_devices (); d2++)
	{
	  dev = omp_invalid_device;
	  #pragma omp dispatch device(d2)
	dev = f ();

	  if (d == omp_initial_device || d == omp_get_num_devices ())
	{
	  if (omp_get_default_device() != omp_initial_device
		  && omp_get_default_device() != omp_get_num_devices ())
		__builtin_abort ();
	}
	  else if (d != omp_get_default_device())
	__builtin_abort ();

	  if (d2 == omp_initial_device || d2 == omp_get_num_devices ())
	{
	  if (dev != omp_initial_device && dev != omp_get_num_devices ())
		__builtin_abort ();
	}
	  else if (dev != d2)
	__builtin_abort ();
	}
}
  return 0;
}

[Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-07-19 Thread Tobias Burnus


Hi,

Jakub Jelinek wrote:

+  "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == 
__STDC_EMBED_FOUND__\n"

If this was an attempt to deal gracefully with no #embed support, then
the above would be wrong and should have been
#if defined(__STDC_EMBED_FOUND__) && defined(__has_embed)
#if __has_embed ("whatever") == __STDC_EMBED_FOUND__


I was kind of both – assuming that #embed is available (as it should be 
compiled by the accompanied compiler) but handle the case that it is not.


However, as '#embed' is well diagnosed if unsupported, that part is not 
really needed.



Now, if all you want is an error if the file doesn't exist, then
#embed "whatever"
will do that too […]

If you want an error not just when it doesn't exist, but also when it
is empty, then you could do
#embed "whatever" if_empty (%%%)


The idea was to also error out if the file is empty – as that shouldn't 
happen here: if offloading code was found, the code gen should be done. 
However, using an invalid expression seems to be a good idea as that's 
really a special case that shouldn't happen.


* * *

I have additionally replaced the #include by __UINTPTR_TYPE__ and 
__SIZE_TYPE__ to avoid including 3 header files; this doesn't have a 
large effect, but still.


Updated patch attached.

OK for mainline, once Jakub's #embed is committed?

* * *

BTW: Testing shows for a hello world program (w/o #embed patch)

For -foffload=...: 'disable' 0.04s, 'nvptx-none' 0.15s, 'amdgcn-amdhsa' 
1.2s.


With a simple #embed (this patch plus Jakub's first patch), the 
performance is unchanged. I then applied Jakub's follow up patches, but 
I then get an ICE (Jakub will have a look).


But compiling it with 'g++' (→ COLLECT_GCC is g++) works; result: takes 
0.2s (~6× faster) and compiling for both nvptx and gcn takes 0.3s, 
nearly 5× faster.


Tobias
 gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_asm): Do not add '#include' to generated C file.
	(process_obj): Generate C file that uses #embed and use
	__SIZE_TYPE__ and __UINTPTR_TYPE__ instead the #include-defined
	size_t and uintptr.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 79 +++--
 1 file changed, 12 insertions(+), 67 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..c3c998639ff 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -657,10 +619,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *);
 
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n\n");
-
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
 
@@ -725,35 +683,28 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
- FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+ If the file is empty, a parse error is shown as the argument to is_empty
+ is an undeclared identifier.  */
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#embed \"%s\" if_empty (error_file_is_empty)\n"
+	   "};\n\n",

Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args

2024-07-18 Thread Tobias Burnus


Hi PA,

not yet a full review, but some observations:

First: Please include the change
  gcc/fortran/types.def (BT_FN_PTR_CONST_PTR_INT)
of "[PATCH v2 7/8] OpenMP: Fortran front-end support for dispatch + 
adjust_args"


Do so either in this patch (3/8) - or in the previous (2/8) one that 
adds it to gcc/builtin-types.def.


Otherwise this will break the build as omp-builtins.def (modified
in this patch) is also used by gfortran.
Causing intermittened build fails is bad - first, in general, and
secondly it causes issues when bisecting.

* * *

If I try your testcase and move "bar" and "baz" *after* 'foo' and leave 
only the following before:


int baz (double *d_bv, const double *d_av, int n);
int bar (double *d_bv, const double *d_av, int n);

it fails at runtime with:

ERROR at 1: 0.00 (act) != 2.718280 (exp)

as the two calls to __builtin_omp_get_mapped_ptr are now missing.

With both the declaration and the definition before the declare target, 
it works.


* * *

I think this variant needs to be either supported – or an error has to 
be printed that it cannot be supported, but that would be rather 
unfortunate.


Thanks,

Tobias

Re: [PATCH v2 2/8] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-07-18 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.


LGTM.

OFF TOPIC regarding "OMP_TRAIT_SET_NEED_DEVICE_PTR" and
"pseudo-set selector used to convey argument list until variant has a decl":
This reminds me vaguely of the issue that we should store the variant 
declarations with the base function and not with the variant, cf.

https://gcc.gnu.org/PR113905

Thanks for the patch!

Tobias


It also adds support for new OpenMP context selectors: `dispatch` as trait
selector and `need_device_ptr` as pseudo-trait set selector. The purpose of the
latter is for the C++ front-end to store the list of arguments (that need to be
converted to device pointers) until the declaration of the variant function
becomes available.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_tss_code): Add
OMP_TRAIT_SET_NEED_DEVICE_PTR.
(enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.
---
  gcc/builtin-types.def|  1 +
  gcc/omp-selectors.h  |  3 +++
  gcc/tree-core.h  |  7 +++
  gcc/tree-pretty-print.cc | 21 +
  gcc/tree.cc  |  4 
  gcc/tree.def |  5 +
  gcc/tree.h   |  7 +++
  7 files changed, 48 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
  DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
  DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, 
BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
  
  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
  
diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h

index c61808ec0ad..12bc9e9afa0 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -31,6 +31,8 @@ enum omp_tss_code {
OMP_TRAIT_SET_TARGET_DEVICE,
OMP_TRAIT_SET_IMPLEMENTATION,
OMP_TRAIT_SET_USER,
+  OMP_TRAIT_SET_NEED_DEVICE_PTR, // pseudo-set selector used to convey argument
+// list until variant has a decl
OMP_TRAIT_SET_LAST,
OMP_TRAIT_SET_INVALID = -1
  };
@@ -55,6 +57,7 @@ enum omp_ts_code {
OMP_TRAIT_CONSTRUCT_PARALLEL,
OMP_TRAIT_CONSTRUCT_FOR,
OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
OMP_TRAIT_LAST,
OMP_TRAIT_INVALID = -1
  };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {
  
/* OpenACC clause: nohost.  */

OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
  };
  
  #undef DEFTREESTRUCT

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
  case OMP_CLAUSE_EXCLUSIVE:
name = "exclusive";
goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
  case OMP_CLAUSE__LOOPTEMP_:
name = "_looptemp_";
goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node

Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-16 Thread Tobias Burnus


Hi Sandra,

Am 16.07.24 um 19:03 schrieb Sandra Loosemore:
Well, I still do not understand why backward compatibility concerns 
specific to some other directive should affect the ABI for a new 
directive that does not have any current libgomp runtime support,


I am happy that I managed to explain you the background of the "-1" 
mess. Otherwise:



The backward-compatibility hack is not required, but it has two 
advantages: consistency of the values used and it makes the code inside 
target.c way simpler by just using


  struct gomp_device_descr *devicep = resolve_device (device, true);

instead of handling several additional cases.


However, as written, avoiding the '(n == -1) ? -2 : n' code generation 
also has advantages; hence, I am also happy with that variant. (i.e. -2 
or -3 denoting the default device).


However, if you use -2 == default device, you need to fix the 
libgomp/target.c implementation as your code doesn't handle 
omp_default_device correctly, which 'resolve_device (device, true);' 
would handle automatically.



you just tell me what ABI you want me to implement and I will re-do 
the code that way.


Having looked at the code again – and in particular at libgomp/target.c, 
I realized the merits of using -2. Thus, at the end, I am happy with 
*either* variant.


But either version requires some changes: One the creation of the 
conditional gimple code + much simplified code in target.c. And the 
other, keeping the current gimple code – but fixing/extending target.c.


Tobias

Re: [PATCH v2 1/8] Fix warnings for tree formats in gfc_error

2024-07-16 Thread Tobias Burnus

I think it would be nice if some C/C++/global maintainer could rubber 
stamp the following patch.



Otherwise, I think it is trivial, i.e. I think it can be committed in a 
few days, unless someone has concerns.


This change to gcc/c-family/c-format.cc LGTM from the *gfortran* POV and 
is trivially copied from gcc_tdiag_char_table or gcc_cdiag_char_table 
(which both have it).


* * *

Background:

While this is for gcc/c-family/c-format.cc, the 'gcc_gfc_char_table' is 
for diagnostic for compiling gcc/fortran/, only.


Namely, the gfc_error, gfc_warning etc. functions are annotated by the
format checking attribute:

#define ATTRIBUTE_GCC_GFC(m, n) __attribute__ ((__format__ (__gcc_gfc__, 
m, n))) ATTRIBUTE_NONNULL(m)


* * *

As gfc_error etc. call the common diagnostic at the end, '%qE', %qD' 
etc. are already supported.


(As tested manually; it is also used by this patch series of PA.)

But while %qE is already supported, without the 'gcc_gfc_char_table' 
change, the '__format__ (__gcc_gfc__' check does not recognize it and

yields a -Werror, causing that a bootstrap fails.

Hence, we need this patch …

* * *

Paul-Antoine Arras wrote:

This enables proper warnings for formats like %qD.

gcc/c-family/ChangeLog:

* c-format.cc (gcc_gfc_char_table): Add formats for tree objects.
---
  gcc/c-family/c-format.cc | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 5bfd2fc4469..f4163c9cbc0 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -847,6 +847,10 @@ static const format_char_info gcc_gfc_char_table[] =
/* This will require a "locus" at runtime.  */
{ "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN  }, "", "R", NULL },
  
+  /* These will require a "tree" at runtime.  */

+  { "DFTV", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "'",   NULL },
+  { "E",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
+
/* These will require nothing.  */
{ "<>",0, STD_C89, NOARGUMENTS, "",  "",   NULL },
{ NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL

Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-16 Thread Tobias Burnus


Hi Sandra,

Sandra Loosemore wrote:

+    /* omp_initial_device is -1, omp_invalid_device is -4; choose
+   a value that isn't otherwise defined to indicate the default
+   device.  */
+    device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently for 'target' and it should do the 
same. Some value usage history:


Without caring for backward compatibility, I think we had somewhere

#define OMP_DEFAULT_DEVICE -2

and would simply use it everywhere when doing API calls.


But to handle old code, we have to handle both:
 -1 → default device
and
 -1 → initial device (= host).


Before coming back to your code, let's try to explain the history
and reason again. Maybe I manage to explain it better this time:

* * *

The problem is that -1 on the user side and -1 on the internal-use
side mean different things. Namely:

In the old days OpenMP had on the user side:
  device numbers 0 ... omp_get_num_devices()
where the upper bound was the initial device (= host), 
omp_get_initial_device().


For
  omp target num_device(n)
the device number has to be passed to the run time – and GCC just passes 
"n" here.


But GCC also needs to handle:
  omp target
i.e. not specifying a device number (= using the default device). It has 
been implemented in the obvious way, i.e. passing '-1'.



Later, OpenMP added:
  omp_initial_device == -1
  omp_invalid_device (negative, implementation defined, != 
omp_initial_device)


GCC set the latter rather arbitrary to -4.


RESULT: Everything works fine, except for -1 as
  omp target device_num(omp_initial_device)
and
  omp target
are now the same, but semantically one uses the host and the other the
default device.


Therefore, GCC uses:
(A) API routines - use omp_initial_device == -1 as value.

(B) Directives - use -1 for no clause (= backward compatible), using the 
default device.

Using -2 for omp_initial_device.


Hence, the following defines exist:

#define GOMP_DEVICE_ICV -1
#define GOMP_DEVICE_HOST_FALLBACK   -2
#define GOMP_DEVICE_INVALID -4


If you call an OpenMP runtime API routine, you need to use -1 for the 
initial device and for GOMP_* functions related to directives -2 using

GOMP_DEVICE_HOST_FALLBACK, when constructing it manually.

Code wise, GCC handles num_device(n) by generating code like:
  if !num_device
devnum = GOMP_DEVICE_ICV;
  else
devnum = (n == -1) ? GOMP_DEVICE_HOST_FALLBACK : n;

That's not ideal but one solution to handle backward compatibility.



Inside libgomp/target.c, there is:

  resolve_device (int device_id, bool remapped)

and 'remapped' is
- 'false' for OpenMP API routines and
- 'true' for GOMP_* calls.

The following code in resolve_device does then undo the '-1':

  if (remapped && device_id == GOMP_DEVICE_ICV)
  device_id = icv->default_device_var;
  remapped = false;
  if (device_id < 0)
  if (device_id == (remapped ? GOMP_DEVICE_HOST_FALLBACK
 : omp_initial_device))
return NULL;

* * *

Now coming back to your code:

If you call
  resolve_device
directly, using the GOMP_* variant makes sense, i.e. passing
the device number as is with 'remap = true'. This also makes
sense for consistency with the remaining code.

Downside: This requires to add
  (n == -1) ? -2 : n
for user-specified 'n'.


If you handle the device_num resolution yourself in libgomp, you have 
two variants to chose from:


(a) using a different value to denote the default-device (e.g. '-2' or 
'-3')  and pass it as is


(b) call resolve_device with remapping in libgomp, but handling -1 for 
the default device as '(n == -1) ? -2 : n' during code gen


I think either works - and either variant is confusing in one way or the
other.

* * *

Jumping to:

[PATCH v2 03/12] libgomp: runtime support for target_device selector

libgomp/target.c:


+bool
+GOMP_evaluate_target_device (int device_num, const char *kind,
+const char *arch, const char *isa)
+{


If you do the remapping, you could just use:

  struct gomp_device_descr *devicep = resolve_device (device, true);
  if (kind && strcmp (kind, "any") == 0)
kind = NULL;
  if (devicep == NULL)
result = GOMP_evaluate_current_device (kind, arch, isa);
  else
result = device->evaluate_device_func (device_num, kind, arch, isa);

which seems to be simpler than the code you have.

If you don't do the remapping:


+  bool result = true;
+
+  /* -2 is a magic number to indicate the device number was not specified;
+ in that case it's supposed to use the default device.  */
+  if (device_num == -2)
+device_num = omp_get_default_device ();


… then you need to handle -2 yourself.


+  if (kind && strcmp (kind, "any") == 0)
+kind = NULL;
+
+  gomp_debug (1, "%s: device_num = %u, kind=%s, arch=%s, isa=%s",
+ __FUNCTION__, device_num, kind, arch, isa);
+
+  if (omp_get_device_num () == device_num)
+result = GOMP_evaluate_current_device (kind, arch

x86_64-gnu-linux bootstrap fail (was: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw)

2024-06-25 Thread Tobias Burnus


Hi Evgeny,

I am not sure whether I have chosen the right email in the thread but:
a x86-64 GNU Linux build currently fails as follows.

At a glance, it seems to be sufficient to remove the prototype 
declaration in i386.cc.


Namely:

gcc/config/i386/i386.cc:107:12: error: 'rtx_def* 
legitimize_dllimport_symbol(rtx, bool)' declared 'static' but never 
defined [-Werror=unused-function]

  107 | static rtx legitimize_dllimport_symbol (rtx, bool);
  |^~~

gcc/gcc/config/i386/i386.cc:108:12: error: 'rtx_def* 
legitimize_pe_coff_extern_decl(rtx, bool)' declared 'static' but never 
defined [-Werror=unused-function]

  108 | static rtx legitimize_pe_coff_extern_decl (rtx, bool);
  |^~
^Cmake[3]: *** [Makefile:2556: i386.o] Interrupt

There is:

config/i386/i386.cc:static rtx legitimize_dllimport_symbol (rtx, bool);
config/mingw/winnt-dll.cc:legitimize_dllimport_symbol (rtx symbol, bool 
want_reg)
config/mingw/winnt-dll.cc:  return legitimize_dllimport_symbol 
(addr, inreg);
config/mingw/winnt-dll.cc:rtx t = legitimize_dllimport_symbol 
(XEXP (XEXP (addr, 0), 0), inreg);



And:

config/i386/i386.cc:static rtx legitimize_pe_coff_extern_decl (rtx, bool);
config/mingw/winnt-dll.cc:legitimize_pe_coff_extern_decl (rtx symbol, 
bool want_reg)
config/mingw/winnt-dll.cc:return legitimize_pe_coff_extern_decl 
(addr, inreg);
config/mingw/winnt-dll.cc:  rtx t = legitimize_pe_coff_extern_decl 
(XEXP (XEXP (addr, 0), 0), inreg);


Tobias

[Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus


[I messed up copying from the build system, picking up an old version.
Changes to v1 (bottom of the diff): fopen is no longer required.]

Tobias Burnus wrote:

mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobiasgcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 72 -
 1 file changed, 12 insertions(+), 60 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0c840318b2d 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1312,13 +1270,7 @@ main (int argc, char **argv)
   fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args");
   obstack_free (&ld_argv_obstack, NULL);
 
-  in = fopen (gcn_o_name, "r");
-  if (!in)
-	fatal_error (input_location, "cannot open intermediate gcn obj file");
-
-  process_obj (in, cfile, omp_requires);
-
-  fclose (in);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
   xputenv (concat ("COMPILER_PATH=", cpath, NULL));

[Patch] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus


mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobias
gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 66 +
 1 file changed, 12 insertions(+), 54 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0ccb874398a 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1316,7 +1274,7 @@ main (int argc, char **argv)
   if (!in)
 	fatal_error (input_location, "cannot open intermediate gcn obj file");
 
-  process_obj (in, cfile, omp_requires);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   fclose (in);

[Patch] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-06-21 Thread Tobias Burnus


Hi all,

it turned out that 'declare target' with 'link' clause was broken in multiple 
ways.

The main fix is the attached patch, i.e. namely pushing the variables already to
the offload-vars list already in the FE.

When implementing it, I noticed:
* C has a similar issue when using nested functions, which is
  a GNU extension →https://gcc.gnu.org/115574

* When doing partial mapping of arrays (which is one of the reasons for 'link'),
  offsets are mishandled in Fortran (not tested in C), see FIXME in the patch)
  There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19.
  (I will file a PR about this).

* It might happen that linked variables do not get linked. I have not 
investigated
  why, but 'arr2' gives link errors – while 'arr' works.
  See FIXME in the patch. (I will file a PR about this)

* For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577

* When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c',
  it fails to link the device side as the 'mycom_' symbol cannot be found on the
  device side.  (I will file a PR about this)

As COMMON as issues, an alternative would be to defer the trans-common.cc
changes to a later patch.

Comments, questions, concerns?

Tobias

PS: Tested with nvptx offloading with a page-migration supporting system with
nvptx and GCN offloading configured and no new fails observed.
OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

	PR fortran/115559

gcc/fortran/ChangeLog:

	* trans-common.cc (build_common_decl): Add 'omp declare target' and
	'omp declare target link' variables to offload_vars.
	* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
	call decl_attributes.
	(get_proc_pointer_decl, gfc_get_extern_function_decl,
	build_function_decl): Update calls.
	(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
	to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: New test.

 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 119 +
 3 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663..e714342c3c0 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
 	  = tree_cons (get_identifier ("omp declare target"),
 		   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+	{
+	  /* Add to offload_vars; get_create does so for omp_declare_target,
+	 omp_declare_target_link requires manual work.  */
+	  gcc_assert (symtab_node::get (decl) == 0);
+	  symtab_node *node = symtab_node::get_create (decl);
+	  if (node != NULL && com->omp_declare_target_link)
+	{
+	  node->offloadable = 1;
+	  if (ENABLE_OFFLOADING)
+		{
+		  g->have_offload = true;
+		  if (is_a  (node))
+		vec_safe_push (offload_vars, decl);
+		}
+	}
+	}
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 8d4f06a4e1d..4067dd6ed77 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-constants.h"
 #include "gimplify.h"
+#include "context.h"
 #include "omp-general.h"
+#include "omp-offload.h"
 #include "attr-fnspec.h"
 #include "tree-iterator.h"
 #include "dependency.h"
@@ -1470,19 +1472,18 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
 }
 
 
-static tree
-add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+static void
+add_attributes_to_decl (tree *decl_p, const gfc_symbol *sym)
 {
   unsigned id;
-  tree attr;
+  tree list = NUL

Re: [PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

Compared to the previous v4 (1/5) posting of this patch:
- The enumeration of the ompx allocators have been moved (again) to 200
   (as 100 is already in use by another toolchain vendor and this seems
   like a possible source of confusion).
- The "ompx" has also been changed to "ompx_gnu" to highlight that these
   are specifically GNU extensions.
- The failure mode of the testcases had been modified, including adding
   an abort in CHECK_SIZE and skipping the test on unsupported platforms.
- The OMP_ALLOCATE environment variable now supports the new allocator.
- The Fortran frontend allows use of the new allocator in "allocator"
   clauses.

---

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.


The patch LGTM.

Thanks!

Tobias

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
  incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge
---
  gcc/fortran/openmp.cc |  11 +-
  .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +++
  libgomp/allocator.c   | 115 +-
  libgomp/env.c |   1 +
  libgomp/libgomp.texi  |   7 +-
  libgomp/omp.h.in  |   1 +
  libgomp/omp_lib.f90.in|   2 +
  libgomp/omp_lib.h.in  |   2 +
  libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 100 +++
  libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 102 
  .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
  11 files changed, 336 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
  create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

Re: [PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.


LGTM. Thanks!

Tobias

Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-08 Thread Tobias Burnus


Andre Vehreschild wrote:

PS That's good news about the funding. Maybe we will get to see "built in"
coarrays soon?

You hopefully will see Nikolas work on the shared memory coarray support, if
that is what you mean by "built in" coarrays. I will be working on the
distributed memory coarray support esp. fixing the module issues and some other
team related things.


Cool! (Both of it.)

I assume "distributed memory coarray support" is still based on Open
Coarrays?

* * *

I am asking because there is coarray API being defined: Parallel Runtime
Interface for Fortran (PRIF), https://go.lbl.gov/prif

with an implementation called Caffeine – CoArray Fortran Framework of
Efficient Interfaces to Network Environments,
https://crd.lbl.gov/caffeine which uses GASNet or POSIX processes.

Well, the among the implementers is (unsurprising?) Damian – and the
idea seems to be that LLVM's FLANG will use the API.

Tobias

PS: I think it might be useful in the long run to support both
PRIF/Caffeine and OpenCoarrays.

I have attached my hello-world patch for -fcoarray=prif that I wrote
after ISC-HPC; it only handles this_image() / num_images() + init/stop.
I got confirmation by the PRIF developers that the next revision will
permit calling __prif_MOD_prif_init multiple times such that one can use
it in the constructor for static coarrays, which won't work otherwise.
gcc/ChangeLog:

	* flag-types.h (enum gfc_fcoarray):

gcc/fortran/ChangeLog:

	* invoke.texi:
	* lang.opt:
	* trans-decl.cc (gfc_build_builtin_function_decls):
	(create_main_function):
	* trans-intrinsic.cc (trans_this_image):
	(trans_num_images):
	* trans.h (GTY):

 gcc/flag-types.h   |  3 ++-
 gcc/fortran/invoke.texi|  7 +-
 gcc/fortran/lang.opt   |  5 +++-
 gcc/fortran/trans-decl.cc  | 56 --
 gcc/fortran/trans-intrinsic.cc | 42 +++
 gcc/fortran/trans.h|  5 
 6 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5a2b461fa75..babd747c01d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -427,7 +427,8 @@ enum gfc_fcoarray
 {
   GFC_FCOARRAY_NONE = 0,
   GFC_FCOARRAY_SINGLE,
-  GFC_FCOARRAY_LIB
+  GFC_FCOARRAY_LIB,
+  GFC_FCOARRAY_PRIF
 };
 
 
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..331a40d31db 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,12 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+
+@item @samp{prif}
+Using the Parallel Runtime Interface for Fortran (PRIF),
+@url{https://go.lbl.gov/@/prif}; for instance, via Caffeine,
+@url{https://go.lbl.gov/@/caffeine}.
 @end table
 
 
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5efd4a0129a..9ba957d5571 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -786,7 +786,7 @@ Copy array sections into a contiguous block on procedure entry.
 
 fcoarray=
 Fortran RejectNegative Joined Enum(gfc_fcoarray) Var(flag_coarray) Init(GFC_FCOARRAY_NONE)
--fcoarray=	Specify which coarray parallelization should be used.
+-fcoarray=	Specify which coarray parallelization should be used.
 
 Enum
 Name(gfc_fcoarray) Type(enum gfc_fcoarray) UnknownError(Unrecognized option: %qs)
@@ -800,6 +800,9 @@ Enum(gfc_fcoarray) String(single) Value(GFC_FCOARRAY_SINGLE)
 EnumValue
 Enum(gfc_fcoarray) String(lib) Value(GFC_FCOARRAY_LIB)
 
+EnumValue
+Enum(gfc_fcoarray) String(prif) Value(GFC_FCOARRAY_PRIF)
+
 fcheck=
 Fortran RejectNegative JoinedOrMissing
 -fcheck=[...]	Specify which runtime checks are to be performed.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index dca7779528b..d1c0e2ee997 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -170,6 +170,10 @@ tree gfor_fndecl_co_sum;
 tree gfor_fndecl_caf_is_present;
 tree gfor_fndecl_caf_random_init;
 
+tree gfor_fndecl_prif_init;
+tree gfor_fndecl_prif_stop;
+tree gfor_fndecl_prif_this_image_no_coarray;
+tree gfor_fndecl_prif_num_images;
 
 /* Math functions.  Many other math functions are handled in
trans-intrinsic.cc.  */
@@ -4147,6 +4151,31 @@ gfc_build_builtin_function_decls (void)
 	get_identifier (PREFIX("caf_random_init")),
 	void_type_node, 2, logical_type_node, logical_type_node);
 }
+  else if (flag_coarray == GFC_FCOARRAY_PRIF)
+{
+  tree pint_type = build_pointer_type (integer_type_node);
+  tree pbool_type = build_pointer_type (boolean_type_node);
+  tree pintmax_type_node = get_typenode_from_name (INTMAX_TYPE);
+  pintmax_type_node = build_pointer_type (pintmax_type_node);
+
+  gfor_fndecl_prif_init = gfc_build_library_function_decl_with_spec (
+	get_identifier ("__prif_MOD_prif_init"), ". W ",
+	void_type_node, 1, pint

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-08 Thread Tobias Burnus


Hi Gerald,

Gerald Pfeifer wrote:

Looks like a janitorial task to fix the absolute links, possibly
excluding those with /git, /onlinedocs, /wiki – or assuming that the
main page is GCC.gnu.org, relying on the redirects.

It's on my list. A first quick check indicates there isn't much to do,
though. :-)


You could consider

htdocs/search.html:

to avoid a redirect (but it is not a broken link);
otherwise, I but I concur that it seems to be (mostly) fine :-)

* * *


+  loop-transformation constructs are now supported.
I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?

Loop transformation happens at the end. But e.g "(#pragma omp) unroll
full" is a directive and, e.g.
...
is a construct (= directive + structured block (if any) + end directive
(if any)).

I believe there was a misunderstanding and I wasn't clear enough: I was
wondering whether instead of "loop-transformation" the patch should have
"loop transformation".

In your response you use the version without dash, so I guess we agree?
:-)


(Pedantically it's a hyphen (-) and not a(n en/em) dash (–/—), i.e. '-' 
not '--' or '---' in TeX.)


No, we don't. – There is a difference whether the two words are used 
alone or as modifier to a noun, like the "this is well defined" vs. "a 
well-defined project".


Thus, while "loop transformation happens" is without hyphen (as we both 
agree),* for "loop(-| )tranformation constructs" the (non-)usage of 
hyphens is not well defined; grouping wise, those are clearly '((loop 
transformation) constructs)' and not '(loop (transformation constructs))'.


I believe both variants are perfectly fine.

BTW: In the OpenMP pre-6.0 draft (TR12), the verb 'transform' is now 
used as noun not with suffix '-ation' but with the suffix '-ing' (also 
referred to as gerund) such that a section title now uses 
"Loop-Transforming Constructs"; I think for '(word) plus (-ing word)' – 
used as modifier –, a hyphen is a tad more common than for '(word) plus 
'(word with -ation suffix)'.


Tobias

* The Oxford Guide to Style points out some words that do get 
hyphenated: clear-cut, drip-proof, take-off, part-time, … – or to refer 
to the abstract meaning rather than literal: bull's-eye, crow's-feet, … 
— Formerly, present particle plus noun got hyphenated when the compound 
was acted on: walking-stick, walking-frame. Likewise, it was formerly 
normal in British English to hyphenate a single adjectival noun and the 
noun it modified: note-cue, title-page, volume-number (less common now, 
but can linger in some combination). And until recently: small 
scale-factory (vs. small-scale factory), white water-lily (vs. 
white-water lily).

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus


Hi Gerald,

Gerald Pfeifer wrote:

+++ b/htdocs/gcc-15/changes.html
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP

Can you please make this a relative link, i.e. "../projects/gomp/"?


Good point. I thought such links should be absolute because of 
(www.)GNU.org, i.e.


https://www.gnu.org/software/gcc/releases.html

... but also that page has https://www.gnu.org/software/gcc/projects/gomp/

GNU.org does not have the documentation, but going to 
https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302 
temporary redirect) to the GCC website. Likewise for '../git' but for 
'../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works.


I think there are plenty of links which could be relative ones but are 
absolute ones.


Looks like a janitorial task to fix the absolute links, possibly 
excluding those with /git, /onlinedocs, /wiki – or assuming that the 
main page is GCC.gnu.org, relying on the redirects.


In any case, those links are probably broken on GNU.org:

htdocs/gcc-14/porting_to.html:href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma 
GCC diagnostic warning


htdocs/gcc-5/changes.html:    A href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual


* * *


+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+

I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?


Loop transformation happens at the end. But e.g "(#pragma omp) unroll 
full" is a directive and, e.g.


#pragma omp unroll partial(2)

for (int i=0; i < n; i++)

a[i] = 5;

is a construct (= directive + structured block (if any) + end directive 
(if any)).


Tobias

Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus


Sandra Loosemore wrote:

On 6/6/24 06:06, Tobias Burnus wrote:
+@item I/O within OpenMP target regions and OpenACC compute regions 
is supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} 
statements are
+  supported within OpenMP target regions, but not yet OpenACC 
compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.


Yes, "...not yet within OpenACC compute regions", please.


Thanks! Committed as https://gcc.gnu.org/r15-1072-g423522aacd9f30

Tobias

Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus


Hi Thomas,

regarding the commit r15-1070-g3a4775d4403f2e / https://gcc.gnu.org/r15-1070

First, thanks for adding I/O support to nvptx offloading.

I have a wording nit, to be confirmed by a native speaker:


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi

...

+@item I/O within OpenMP target regions and OpenACC compute regions is 
supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} statements are
+  supported within OpenMP target regions, but not yet OpenACC compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.

Otherwise, it seemed to fine at a glance – and I am happy that that 
feature now finally works :-)


Hooray, no longer using reverse offload ("!$omp target 
device(ancestor:1)") for Fortran I/O when debugging.


Thanks,

Tobias

Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-06-06 Thread Tobias Burnus


Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:


Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"


100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.


Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
  /* Preview of pinned memory support */
  ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
typedef enum omp_allocator_handle_t
...
  llvm_omp_target_host_mem_alloc = 100,
  llvm_omp_target_shared_mem_alloc = 101,
  llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
  llvm_omp_target_host_mem_space = 100,
  llvm_omp_target_shared_mem_space = 101,
  llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *


The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.


...


diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)


...


   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))


I am surprised that this compiles: Why do you re-#define this macro?

* * *


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
 omp_cgroup_mem_alloc = 6,
 omp_pteam_mem_alloc = 7,
 omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,


See remark regarding "100" at the top of this email.


--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100


Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

3 |   !$omp allocate(a) allocator(100)

  | 21

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the SAV

[wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus


GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

Updates https://gcc.gnu.org/gcc-15/changes.html
and https://gcc.gnu.org/projects/gomp/

Comments?

Tobias
gcc-15/changes.html + projects/gomp: update for new OpenMP features

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

 htdocs/gcc-15/changes.html  | 27 ++-
 htdocs/projects/gomp/index.html | 11 +++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..94528ebd 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -40,6 +40,24 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP
+  
+
+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPUs devices, enabled only when using the
+  unified_shared_memory clause to the requires
+  directive. For details, see the offload-target specifics section in the
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
+  >GNU Offloading and Multi Processing Runtime Library Manual.
+
+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+
+  
+
+
 
 
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 94bda5ff..d1765fc3 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 requires directive
-
+
   GCC 9
   GCC 12
   GCC 13
-  GCC 14
+  GCC 14
+  GCC 15
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
   unified_shared_memory
-  complete but no non-host devices provides unified_shared_memory
+  complete but no non-host devices provides unified_shared_memory
+  complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";>
+  Offload-Target Specifics
 
   
   
@@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Loop transformation constructs
-No
+GCC 15

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1138 matches

Mail list logo