[Patch] OpenMP: Allocate directive for static vars, clean up
'omp allocate' permits to use a different (specified) allocator and alignment for both stack/automatic and static/saved variables; the latter takes only predefined allocators. Currently, only C and Fortran are support for stack/automatic variables; static variables are rejected before the attached patch. (For them, only predefined allocators are permitted.) * * * I happened to look at the 'allocate' directive recently and, doing so, I stumbled over a couple of issues, which the attached patch addresses (missing diagnostics for corner cases, not updated checks, unhelpful documentation ['allocate' *clause*], ...). Doing so, I wondered whether: Shouldn't we just accept 'omp allocate' for static variables by just honoring the aligning and ignoring the actually requested allocator? - First, we do already the same for actual allocations as not all traits are supported. And for the host this seems to be the most sensible to do in any case. [For some use cases, pointers + allocation in the constructor would be better, but in general, not adding an indirection seems to be better and has fewer corner-case usability issue.] I guess we later want to honor the requested memory for nvptx and/or gcn; at least Nvidia GPUs could make use for constant memory (having advantages for reading the same memory by many threads/broadcasting it). I guess OpenACC 2.7's 'readonly' modifier serves a similar purpose. For now we don't, but the attribute is passed on to the backends, which could make use of them, if desired. ('groupprivate' directive vs. cgroup/thread allocators are similar device-only features.) As mentioned, this patch also fixes a few other issues here and there, see commit log and source code for details. Code comments? Suggestions or remarks? - Before I apply this patch? Tobias PS: I am aware that C++ support is lacking. There is a pending patch that needs to be updated for this patch, probably some bitrotting, and in particular for the review comments, cf. https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633782.html and https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639929.html OpenMP: Allocate directive for static vars, clean up For the 'allocate' directive, remove the sorry for static variables and just keep using normal memory, but honor the requested alignment and set a DECL_ATTRIBUTE in case a target may want to make use of this later on. The documentation is updated accordingly. The C diagnostic to check for predefined allocators in this case failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was already okay; but both now use new common #defined value for checking.) And while Fortran common block variables are still rejected, the check has been improved as before the sorry diagnostic did not work for common blocks in modules. Finally, for 'allocate' clause on the target/task/taskloop directives, there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator with access = thread), which is undefined behavior according to the OpenMP specification. And, last, testing showed that var decl + static_assert sets TREE_USED but does not produce a statement list in C, which did run into an assert in gimplify. This special case is now also handled. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_allocate): Set alignment for alignof; accept static variables and fix predef allocator check. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Use gomp-constants.h consts. * trans-common.cc (translate_common): Reject OpenMP allocate directives. * trans-decl.cc (gfc_finish_var_decl): Handle allocate directive for static variables. (gfc_trans_deferred_vars): Update for the latter. gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP allocate directive. (gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used as allocator with the target/task/taskloop directive. include/ChangeLog: * gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX, GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX, GOMP_OMP_PREDEF_ALLOC_THREADS): New defines. libgomp/ChangeLog: * allocator.c: Add static asserts for news GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values. * libgomp.texi (OpenMP Impl. Status): Allocate directive for static vars is now supported. Refer to PR for allocate clause. (Memory allocation): Update for static vars; minor word tweaking. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-9.c: Update for removed sorry. * gfortran.dg/gomp/allocate-15.f90: Likewise. * gfortran.dg/gomp/allocate-pinned-1.f90: Likewise. * gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for previously missing diagnostic. * c-c++-common/gomp/allocate-18.c: New test. * c-c++-common/gomp/allocate-19.c: New test. * gfortran.dg/gomp/allocate-clause.f90: New test. * gfortran.dg/gomp/allocate-static-2.f90: New test. * gfortran.dg/gomp/allocate-static.f90: New test. gcc/c/c-parser.cc | 29
Re: [PATCH v4 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces
Paul-Antoine Arras wrote: This patch introduces the OMP_DISPATCH tree node, as well as two new clauses `nocontext` and `novariants`. It defines/exposes interfaces that will be used in subsequent patches that add front-end and middle-end support, but nothing generates these nodes yet. LGTM. Thanks, Tobias gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (dump_generic_node): Handle OMP_DISPATCH. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (omp_clause_code_name): Add "novariants" and "nocontext". * tree.def (OMP_DISPATCH): New. * tree.h (OMP_DISPATCH_BODY): New macro. (OMP_DISPATCH_CLAUSES): New macro. (OMP_CLAUSE_NOVARIANTS_EXPR): New macro. (OMP_CLAUSE_NOCONTEXT_EXPR): New macro. gcc/fortran/ChangeLog: * types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
[committed] libgomp.texi: Remove now duplicate TR13 item (was: [committed] libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items)
Continuing reading https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Technical-Report-13.html showed that I missed one old item, which could be now removed: With the new 'storage' map type it was also no longer fully applicable – and the newly added text already covered it. Committed as Rev. r15-3919-gcfdc0a384aff5e as follow up to r15-3917-g6b7eaec20b046e. * * * While useful, those tables are unfortunately not very readable. (And I wonder how many more non-Appendix B items should be added; it probably requires a full go through the changes and will still likely miss several important but more hidden changes.) Tobias commit cfdc0a384aff5e06f80d3f55f4615abf350b193b Author: Tobias Burnus Date: Fri Sep 27 12:06:17 2024 +0200 libgomp.texi: Remove now duplicate TR13 item Remove an item under "Other new TR 13 features" that since the last commit (r15-3917-g6b7eaec20b046e) to this file is is covered by the added "New @code{storage} map-type modifier; context-dependent @code{alloc} and @code{release} are aliases" "Update of the map-type decay for mapping and @code{declare_mapper}" libgomp/ * libgomp.texi (TR13 status): Update semi-duplicated, semi-obsoleted item; remove left-over half-sentence. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index b561cb5f3f4..c6464ece32e 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -511,7 +511,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. @tab N @tab @item @code{ref} modifier to the @code{map} clause @tab N @tab @item New @code{storage} map-type modifier; context-dependent @code{alloc} and - @code{release} are aliases. Update to map decay @tab N @tab + @code{release} are aliases @tab N @tab @item Update of the map-type decay for mapping and @code{declare_mapper} @tab N @tab @item Change of the @emph{map-type} property from @emph{ultimate} to @@ -633,8 +633,6 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. @item Multi-word directive names are now permitted with underscore @tab N @tab @item In Fortran (fixed + free), space between directive names is mandatory @tab N @tab -@item @code{map(release: ...)} on @code{target} and @code{target_data} (map-type - decay changes) @tab N @tab post-TR13 item @end multitable
[committed] libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items
This commitr15-3917-g6b7eaec20b046e updates .texi for one formatting (@emph → @code) fix and updates some items for post TR13 changes. (The latter is slightly questionable as the title says TR13, which is the third and last draft of OpenMP 6.0, scheduled to be released in time for Supercomputing 2024 in November - and the listed changes are in the current internal draft, only. But on the other hand, post-TR13 work is supposed to be mostly QC tasks and 6.0 is due in around 6 weeks. Furthermore, when looking at the spec changes for this update, I did find an important generator bug, causing text omissions in the spec, which is something I would otherwise probably only encountered after the spec release.) Tobias commit 6b7eaec20b046eebc771022e460c2206580aef04 Author: Tobias Burnus Date: Fri Sep 27 10:48:09 2024 +0200 libgomp.texi: fix formatting; add post-TR13 OpenMP impl. status items libgomp/ * libgomp.texi (OpenMP Technical Report 13): Change @emph to @code; add two post-TR13 OpenMP 6.0 items. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 22eff1d7b55..b561cb5f3f4 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -476,6 +476,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. specifiers @tab Y @tab @item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab @item All inarguable clauses take now an optional Boolean argument @tab N @tab +@item The @code{adjust_args} clause was extended to specify the argument by position @item For Fortran, @emph{locator list} can be also function reference with data pointer result @tab N @tab @item Concept of @emph{assumed-size arrays} in C and C++ @@ -496,7 +497,7 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. clauses @tab P @tab @code{private} not supported @item For Fortran, rejecting polymorphic types in data-mapping clauses @tab N @tab not diagnosed (and mostly unsupported) -@item New @code{taskgraph} construct including @emph{saved} modifier and +@item New @code{taskgraph} construct including @code{saved} modifier and @code{replayable} clause @tab N @tab @item @code{default} clause on the @code{target} directive @tab N @tab @item Ref-count change for @code{use_device_ptr} and @code{use_device_addr} @@ -509,6 +510,10 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. @item New @code{init_complete} clause to the @code{scan} directive @tab N @tab @item @code{ref} modifier to the @code{map} clause @tab N @tab +@item New @code{storage} map-type modifier; context-dependent @code{alloc} and + @code{release} are aliases. Update to map decay @tab N @tab +@item Update of the map-type decay for mapping and @code{declare_mapper} + @tab N @tab @item Change of the @emph{map-type} property from @emph{ultimate} to @emph{default} @tab N @tab @item @code{self} modifier to @code{map} and @code{self} as @@ -516,7 +521,6 @@ Technical Report (TR) 13 is the third preview for OpenMP 6.0. @item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran @tab N @tab @item @code{delete} as delete-modifier not as map type @tab N @tab -@item @code{release} map-type modifier in @code{declare_mapper} @tab N @tab @item For Fortran, the @code{automap} modifier to the @code{enter} clause of @code{declare_target} @tab N @tab @item @code{groupprivate} directive @tab N @tab
[committed] libgomp.texi: Fix deprecation note for omp_{get,set}_nested + OMP_NESTED
While the header files correctly have: extern void omp_set_nested (int) __GOMP_NOTHROW __GOMP_DEPRECATED_5_0; extern int omp_get_nested (void) __GOMP_NOTHROW __GOMP_DEPRECATED_5_0; and for Fortran #if _OPENMP >= 201811 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested ... The documentation wrongly claimed that those were only deprecated in OpenMP 5.2. Fixed as attached / committed inr15-3900-g9ec258bf65e6ae Tobias commit 9ec258bf65e6ae856491f607a987fe15b5385866 Author: Tobias Burnus Date: Thu Sep 26 17:25:34 2024 +0200 libgomp.texi: Fix deprecation note for omp_{get,set}_nested + OMP_NESTED libgomp/ChangeLog: * libgomp.texi (omp_get_nested,omp_set_nested, OMP_NESTED): Fix note about deprecation - correct is 5.0 not 5.2. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 29f5419cd0f..22eff1d7b55 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -937,7 +937,7 @@ active nested regions to the maximum supported. Disabling nested parallel regions sets the maximum number of active nested regions to one. Note that the @code{omp_set_nested} API routine was deprecated -in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}. +in the OpenMP specification 5.0 in favor of @code{omp_set_max_active_levels}. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -984,7 +984,7 @@ regions with @code{omp_set_max_active_levels} to one to disable, or above one to enable. Note that the @code{omp_get_nested} API routine was deprecated -in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}. +in the OpenMP specification 5.0 in favor of @code{omp_get_max_active_levels}. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -3934,7 +3934,7 @@ setting. If both are undefined, nested parallel regions are enabled if more than one item, otherwise they are disabled by default. Note that the @code{OMP_NESTED} environment variable was deprecated in -the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}. +the OpenMP specification 5.0 in favor of @code{OMP_MAX_ACTIVE_LEVELS}. @item @emph{See also}: @ref{omp_set_max_active_levels}, @ref{omp_set_nested},
Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components
Now committed as r15-3895-ge4a58b6f28383c. * * * Next step is to sent the Fortran part. While it exists, I want to proof read what I wrote a couple years back and I want to split-off the polymorphism/class part as the current implementation has some issues and OpenMP 6 decided to disallow polymorphic Fortran variables for now. (Until some corner-case behavior has been defined.) [The existing polymorphism support works but it effectively only permits access to the declared types (as the vtable pointers will be the ones of the host), it also has some issues + as the vtable gained two functions, the ABI compatibility with old code is gone (+ hence the .mod version number was bumped).] The entry code for the committed patch as mentioned before: Am 10.09.24 um 12:19 schrieb Tobias Burnus: The interesting bit are the hook entry points gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, and gfc_omp_deep_mapping → https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209 And I think all code is in this file, once removing the polymorphism code – and replacing it by a diagnostic message. Tobias PS: otherwise missing on the polymorphism side is 'private(class_var)'; 'firstprivate(class_var)' works [all as data-sharing clauses not as data-mapping clauses]. PPS: The host-pointer vtable issue could be solved as for C++ in OpenMP 5.2 by using the 'indirect' feature to lookup the device version of the table. (To be implemented for C++ and potentially for OpenMP 6.1+ (?) for Fortran.)
Re: [Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop
Hi now committed the following as r15-3856-gfcff9c3dad4f35 with two testcase additions (and improved changelog wording). Tobias Burnus wrote: OpenMP mandates that when certain clauses are used with 'omp requires' that in all compilation units this requires clause appears. Those clauses influence the offloading behavior (+ potentially codegen); hence, the must requires must match for those claues when device code is involved. That's the case for device functions (in particular 'declare target') and all OpenMP directives that take a 'device' clause. Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally more explicit. Thus, this patch adds it for 'declare target' and it adds it ("device" clause!) for 'interop' (but only for Fortran as C/C++ still does not support 'interop' directive plarsing.) (Side note: the "device global requirement" got only added to the 'device_safesync' clause after TR13; but we don't support that clause yet; it does appear in the commit log only.) Thanks, Tobias commit fcff9c3dad4f356cbf56feaed7442893203a3003 Author: Tobias Burnus Date: Wed Sep 25 13:57:02 2024 +0200 OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop Older versions of the OpenMP specification were not clear about what counted as device usage. Newer (like TR13) are rather clear. Hence, this commit adds GCC's target-used flag also when a 'declare target' or an 'interop' are encountered. (The latter only to Fortran as C/C++ parsing support is still missing.) TR13 also lists 'dispatch' as target-used construct (as it has the device clause) and 'device_safesync' as clause with global requirement property, but both are not yet supported in GCC. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/fortran/ChangeLog: * parse.cc (decode_omp_directive): Set target-used bit of omp_requires_mask when encountering the declare_target or interop directive. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/interop-1.f90: Add dg-error for missing omp requires requirement and declare_variant usage. * gfortran.dg/gomp/requires-8.f90: Likewise. --- gcc/c/c-parser.cc | 3 +++ gcc/cp/parser.cc | 3 +++ gcc/fortran/parse.cc | 8 ++-- gcc/testsuite/gfortran.dg/gomp/interop-1.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/requires-8.f90 | 4 ++-- 5 files changed, 15 insertions(+), 5 deletions(-) diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 6a46577f511..a681438cbbe 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (c_parser_next_token_is (parser, CPP_NAME) || (c_parser_next_token_is (parser, CPP_COMMA) && c_parser_peek_2nd_token (parser)->type == CPP_NAME)) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 83ae38a33ab..6d3be94bf44 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -49571,6 +49571,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (cp_lexer_next_token_is (parser->lexer, CPP_NAME) || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA) && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME))) diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index e749bbdc6b5..9e06dbf0911 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1345,8 +1345,12 @@ decode_omp_directive (void) switch (ret) { -/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET. - FIXME: Get clarification, cf. OpenMP Spec Issue #3240. */ +/* For the constraints on clauses with the global requirement property, + we set omp_target_seen. This included all clauses that take the + DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device + (which effectively is implied by the former). */ +case ST_OMP_DECLARE_TARGET: +case ST_OMP_INTEROP: case ST_OMP_TARGET: case ST_OMP_TARGET_DATA: case ST_OMP_TARGET_ENTER_DATA: diff --git a/gcc/tes
[Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop
OpenMP mandates that when certain clauses are used with 'omp requires' that in all compilation units this requires clause appears. Those clauses influence the offloading behavior (+ potentially codegen); hence, the must requires must match for those claues when device code is involved. That's the case for device functions (in particular 'declare target') and all OpenMP directives that take a 'device' clause. Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally more explicit. Thus, this patch adds it for 'declare target' and it adds it ("device" clause!) for 'interop' (but only for Fortran as C/C++ still does not support 'interop' directive plarsing.) And comment before I commit it? Tobias PS: In TR13, page 321, lines 14–16 — https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop Older versions of the OpenMP specification were not clear about what counted as device usage. Newer (like TR13) are rather clear. Hence, this commit adds "target used" also when 'declare target' or 'interop' are encountered. (The latter only to Fortran as C/C++ parsing support is still missing.) TR13 also lists 'dispatch' as construct and 'device_safesync' affected by device use, but both are not yet supported in GCC: gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/fortran/ChangeLog: * parse.cc (decode_omp_directive): Set target-used bit of omp_requires_mask when encountering the declare_target or interop directive. gcc/c/c-parser.cc| 3 +++ gcc/cp/parser.cc | 3 +++ gcc/fortran/parse.cc | 8 ++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 6a46577f511..a681438cbbe 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (c_parser_next_token_is (parser, CPP_NAME) || (c_parser_next_token_is (parser, CPP_COMMA) && c_parser_peek_2nd_token (parser)->type == CPP_NAME)) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 35c266659e4..3b3ab0f1923 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -49524,6 +49524,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (cp_lexer_next_token_is (parser->lexer, CPP_NAME) || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA) && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME))) diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index e749bbdc6b5..9e06dbf0911 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1345,8 +1345,12 @@ decode_omp_directive (void) switch (ret) { -/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET. - FIXME: Get clarification, cf. OpenMP Spec Issue #3240. */ +/* For the constraints on clauses with the global requirement property, + we set omp_target_seen. This included all clauses that take the + DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device + (which effectively is implied by the former). */ +case ST_OMP_DECLARE_TARGET: +case ST_OMP_INTEROP: case ST_OMP_TARGET: case ST_OMP_TARGET_DATA: case ST_OMP_TARGET_ENTER_DATA:
Re: libgomp: with USM, init 'link' variables with host address
Now committed as r15-3836-g4cb20dc043cf70 Contrary to the originally posted patch, it also acts on the newer/newly added 'omp requires self_maps'. In the area of (unified-)shared memory/self maps, the next step seems to be to do still mapping for static variables – before moving to refinements like how to handle implicit 'declare target' for static variables, … For this piece of code, we also want to run it for APUs even when no USM has been requested, avoid adding those to the mapping table (for self maps) and do a more efficient mapping (e.g. memcpy or avoid multiple locks). Tobias Tobias Burnus wrote: short version: I think the patch as posted is fine and no action beyond is needed for this one issue. See below for the long version. Possibly modifications (now or as follow up): - using memcpy + or let the plugin do it - not adding link variables to the splay tree with 'USM'. Thomas Schwinge wrote: Tested on x86-64-gnu-linux and nvptx offloading (that supports USM). (I yet have to set up such a USM configuration...) You already used an USM config, e.g., when running gfx90a (likewise: gfx90c), except that USM on mainline it currently only works if you explicitly set 'export HSA_XNACK=1'. For Nvptx, you need a post-Volta GPU with the open-kernels driver, which is for newer driver versions the default. * * * Do I understand correctly that even if 'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the 'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not (yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'? We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires unified_shared_memory'. But, indeed, we cannot skip the memory mapping parts – due to the way we handle static variables. * * * + + if (is_link_var + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start, + &k->host_start, sizeof (void *), false, NULL); } Calling 'gomp_copy_host2dev' looks a bit funny given we've just determined USM (..., but I'm not asking for plain 'memcpy'). I guess a plain memcpy would do as well. [Assuming that the device's static variable is host accessible, which it probably is and should be.] I add it to my to-do list for USM-related tasks to change this; possibly moving it to the plugin side has some advantages? Possibly not adding it to the splay tree if not needed. (Cf. below for env var discussion.) Regarding the unload: For 'declare target link(A)', we have, e.g., 'static int *A' on the device side. Thus, we could do 'A = NULL' – and rather should do 'A = {clobber}', but that's rather pointless in general and especially when unloading the image. What's the advantage/rationale of doing this here vs. in 'gomp_map_vars_internal' for 'REFCOUNT_LINK'? (May be worth a source code comment?) (A, B, C refers to the following example.) We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in gomp_map_vars_internal. Besides: We only want to do the initialization once and not every time gomp_map_vars_internal is called. I think the following program may help to understand the issue and the patch better. Note: While A, B, C are 'int …[3]' on the host, on the device we only have 'int B[3]' while for A it's 'int *A' and C only exists on the host. * * * #pragma requires unified_shared_memory static int A[3], B[3], C[3]; #pragma omp declare target link(A) enter(B) #pragma omp begin declare target void f(int *p) { A[2] += B[2] + p[2]; // p points to the host's C variable } #pragma omp end declare target void foo(int dev) { int *ptr = C; #pragma omp target firstprivate(ptr) device(dev) f (ptr); } * * * Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before the target region and inside the target region. 'B' points to the device local version of the variable. And 'A' on a non-host device is likely to be NULL ('static int *A' + .BSS) before this patch. Or pointing to the host's 'A' with this patch. * * * With A pointing to the host version (and likewise 'p' pointing to the host C), host fallback and device version yield identical result for 'A' and for 'C' (via ptr/p). — However, 'B' on host and non-host device have nothing in common. While that might be fine, in general it is not. Hence, in order to get for a .BSS valued 'B' the same result on host and device, we need, e.g. #pragma omp data map(always: B) device(dev) foo (dev); to call 'foo'
Re: [Patch] OpenMP: Add support for 'self_maps' to the 'require' directive
Hi all, now committed as r15-3822-gb752eed3e3f2f2, see attachment. I fixed on C/C++ test issue (missing 's') and added the Fortran module check. Tobias PS: I noticed that 'declare target' does not add the target-used flag. At least TR13 is very clear that it counts, but currently GCC does not regard this (with a FIXME check spec note.) This needs to be fixed ventually. PPS: Old discussion: Andre Vehreschild: Hi Tobias, to my eye this looks fine. I would appreciate, if you could add some tests for errors on the fortran side, esp. where modules are involved. But no must. Ok for mainline. Thanks for the patch. - Andre On Sat, 21 Sep 2024 23:37:33 +0200 Tobias Burnus wrote: Add support of the 'self_maps' clause in 'omp requires', an OpenMP 6 feature but added here mostly as part of the on-going improvement of the unified-shared memory (USM) handling. Comments, remarks concerns before I commit it? * * * Regarding USM, there is on one hand the hardware: - some hardware cannot access the host memory at all - other hardware can access it, but either only through an interconnect or via page migration on page fault - on the third time of hardware, a host and device share the same memory controller For the latter, a 'map' never does make sense, but for the second case, it depends on the details whether it is better to do mapping or directly accessing the memory (i.e. via interconnect or page migration). On the compile-time side, the user can demand: - no requirement - 'requires unified_shared_memory' (= memory has to be accessible but the implementation can still do mapping for explicit maps) - 'requires shared_memory' - mapping is strictly not permitted. - other hints using compiler flags And for the runtime, the result depends on the actual hardware, the compile-time wishes, environment variables what is done. * * * Currently, the runtime never maps with USM, i.e. both act the same. At least using an environment variable, I would consider enabling mapping - one could also consider to have it always do mappings, except for self_maps. On the compile side, we need to handle implicit 'declare target' better - as it currently leads to separate memory. Using 'link', we could point to the host memory (at least for 'self_maps'). And before we can enable USM by default for integrated/APU devices, we need to solve some issues with 'link' (→ posted link) and for those, 'map' has to be honored. Those are 5.x follow up tasks, but having 'self_maps' available, completes the what-does-the-user-want part. Tobias PS: There is also the 'self' modifier to the map clause, working on a per-variable granularity. However, this like several other 6.0 items is completely out of scope of the current USM work. PPS: See also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and the patch associated set, posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html commit b752eed3e3f2f27570ea89b7c2339468698472a8 Author: Tobias Burnus Date: Tue Sep 24 10:53:59 2024 +0200 OpenMP: Add support for 'self_maps' to the 'require' directive 'self_maps' implies 'unified_shared_memory', except that the latter also permits that explicit maps copy data to device memory while self_maps does not. In GCC, currently, both are handled identical. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Handle self_maps clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Handle self_maps clause. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS. (gfc_namespace): Enlarge omp_requires bitfield. * module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS. (mio_symbol_attribute): Handle it. * openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle self_maps clause. * parse.cc (gfc_parse_file): Handle self_maps clause. gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle self_maps clause. * omp-general.cc (struct omp_ts_info, omp_context_selector_matches): Likewise for the associated trait. * omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_IMPLEMENTATION_SELF_MAPS. include/ChangeLog: * gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept self_maps clause. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): L
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
Hi all, I have now downloaded the file at https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html (by copying it from the browser, not the source code to avoid '> This file had had to fix spurious line breaks like: @@ -5171,7 +5171,7 @@ index_interchange (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED, where the *... belongs to the previous line. the result of this conversion is the attached file. * * * Harald Anlauf wrote: Generally speaking, runtime tests should verify that they work as expected. There are currently only compile-time tests. [One might argue that some should be run-time tests, albeit the really interesting part only happens with local/local_init (currently not supported) – and with true concurrency in particular with 'reduce'.] [The interesting cases of 'local'/'local_init' there is a currently a 'sorry' while 'reduce' only becomes truly interesting if one goes parallel …] Tobias gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_code_node): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. Added support for dumping DO CONCURRENT locality specifiers. * frontend-passes.cc (index_interchange, gfc_code_walker): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. * gfortran.h (enum locality_type): Added new enum for locality types in DO CONCURRENT constructs. * match.cc (match_simple_forall, gfc_match_forall): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. (gfc_match_do): Implemented support for matching DO CONCURRENT locality specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE). * parse.cc (parse_do_block): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. * resolve.cc: Added struct check_default_none_data. (do_concur_locality_specs_f2023): New function to check compliance with F2023's C1133 constraint for DO CONCURRENT. (check_default_none_expr): New function to check DEFAULT(NONE) compliance. (resolve_locality_spec): New function to resolve locality specs. (gfc_count_forall_iterators): Updated to use code->ext.concur.forall_iterator. (gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator. * st.cc (gfc_free_statement): Updated to free locality specifications and use p->ext.concur.forall_iterator. * trans-stmt.cc (gfc_trans_forall_1): Updated to use code->ext.concur.forall_iterator. gcc/testsuite/ChangeLog: * gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT with 'concurrent' as a variable name. * gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO CONCURRENT with nested loops and REDUCE clauses. * gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO CONCURRENT with nested loops and REDUCE clauses. * gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with DEFAULT(NONE) and locality specs. * gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO CONCURRENT clauses and their interactions. * gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT functionality. * gfortran.dg/do_concurrent_constraints.f90: New test for constraints on DO CONCURRENT locality specs. * gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT clause in DO CONCURRENT. * gfortran.dg/do_concurrent_locality_specs.f90: New test for DO CONCURRENT with locality specs. * gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple REDUCE clauses in DO CONCURRENT. * gfortran.dg/do_concurrent_nested.f90: New test for nested DO CONCURRENT loops. * gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT parser error handling. * gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with MAX operation in DO CONCURRENT. * gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with sum operation in DO CONCURRENT. * gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in DO CONCURRENT. Signed-off-by: Anuj --- gcc/fortran/dump-parse-tree.cc| 113 +- gcc/fortran/frontend-passes.cc| 8 +- gcc/fortran/gfortran.h| 20 +- gcc/fortran/match.cc | 286 +- gcc/fortran/parse.cc | 2 +- gcc/fortran/resolve.cc| 354 +- gcc/fortran/st.cc | 5 +- gcc/fortran/trans-stmt.cc | 6 +- .../gfortran.dg/do_concurrent_10.f90 | 11 + .../gfortran.dg/do_concurrent_8_f2018.f90 | 19 + .../gfortran.dg/do_concurrent_8_f2023.f90 | 23 ++ gcc/testsuite/gfortran.dg/do_concurrent_9.f90 | 15 + .../gfortran.dg/do_concurrent_all_clauses.f90 | 26 ++ .../gfortran.dg/do_concurrent_basic.f90 | 11 + .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++ .../gfortran.dg/do_concurrent_local_init.f90 | 11 + .../do_concurrent_locality_spec
Re: OpenMP: Fix omp_get_device_from_uid, minor cleanup
Now committed as r15-3799-gcdb9aa0f623ec7 / https://gcc.gnu.org/r15-3799-gcdb9aa0f623ec7 Tobias Am 21.09.24 um 01:33 schrieb Tobias Burnus: Hi Thomas, hello all, the attached follow-up patch does: * It fixes an issue (thinko) related to Fortran and \0 terminated, which fails for at least substring strings. * Includes some minor fixes, e.g. ensuring the device is initialized in omp_get_uid_from_device, the superfluous 'omp_', or adding some inits to oacc-host.c. * Now the plugins return NULL instead of failing when the UID cannot be obtained; in that case, the fallback UID "OMP_DEV_%d" is used. Comments or remarks before I commit it? * * * Regarding the topic of caching in the plugin instead of in libgomp: If we want to change it, we either to remove the fallback and require the existence and success of GOMP_OFFLOAD_get_uid. Otherwise, with host fallback support, we have to cache it at both locations, which is somehow not really sensible, either. Thoughts on this topic? * * * Longer reply to Thomas' comments: Thomas Schwinge wrote: + "omp_get_uid_from_device", ..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly sorted). Ups! Should be of course without. (as 'omp_' prefix is checked before). Do we apparently not have test suite coverage for these things? We do *not* test all API routines. The check is, e.g., used in gfc_error ("%s cannot contain OpenMP API call in intervening code " or "OpenMP runtime API call %qD in a region with " "% clause", fndecl); And we have a few tests for each of them, but not a full set of all API routines. * * * + const char *uid; Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for each call, is a minor performance optimization? (Similar to other items cached here, I guess.) Yes, but it goes a bit beyond: As the pointer is returned to the user, it has to be allocated at some point - and cached to avoid allocating more memory when called repeatable called. As the fallback and host handling is also done in target.c, the caching is done here. (Besides the API routines, two env vars and one context selector for 'target_device' support the UID.) * * * Please also update 'libgomp/oacc-host.c:host_dispatch'. Done. + ! Note: In gfortran, strings are \0 termined + integer(c_int) function omp_get_device_from_uid(uid) bind(C) For my understanding: in general, Fortran strings are *not* NUL-terminated, right? So this is a specific properly of 'gfortran' and/or this GCC/OpenMP interface, The Fortran standard leaves this to implementation, but by construction, there is a length (however it is made handled internally, e.g. via the declaration) and the actual data. - To aid debugging, gfortran NUL terminates them. However, when thinking a bit more about it, taking a substring of a null-terminated string will not magically be \0 at the boundary of the substring. - Thus, the simplified approach failed + a Fortran specific function had to be added (→ fortran.c). * * * + interface omp_get_uid_from_device + ! Deviation from OpenMP 6.0: VALUE added. (..., which I suppose you've reported to OpenMP...) No - it is not really a bug in the standard. The OpenMP specification tries to provide a consistent API - but it is difficult to create an API without touching the ABI. For the caller side, the usage is the same independent whether there is an 'intent(in)' or VALUE attribute, a Bind(C) with or without binding name. Or also a generic interface with multiple specific ones - which we do to handle -fdefault-integer-8. Obviously, the compiler needs to know those details, but unless users codes the interface themselves instead of using omp.h / omp_lib.h / the omp_lib module. Thus, that's one of the few deviation from the OpenMP specification which does affect the ABI but not the API. * * * +GOMP_OFFLOAD_get_uid (int ord) +{ I guess I'd have just put this code into 'init_hsa_context', filling a new statically-sized 'uuid' field in 'hsa_context_info' (like 'driver_version_s'; and assuming that 'hsa_context_info' is the right abstraction for this), and then just return that 'uuid' from 'GOMP_OFFLOAD_get_uid'. That would be one option. Still, we have to decide whether we either want to have strictly everything handled in the device code - including fallback handling (which could be an UID replacement or a fatal error). Of we do part of the handling elsewhere, e.g. by permitting that the plugin can fail or does not provide the functions, we can handle it in target.c (as currently done) - but then we need to cache it there as well (or at least the fallbacks). * * * That way, you'd avoid the unclear semantics of who gets t
Re: [Patch] gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h
Now committed as r15-3797-ga030fcad4f9f49 / https://gcc.gnu.org/r15-3797-ga030fcad4f9f49 as obvious. Tobias Am 21.09.24 um 00:52 schrieb Tobias Burnus: See attached patch for adding the include lines: + if (gcn_stack_size) + { + fprintf (cfile, "#include \n"); + fprintf (cfile, "#include \n\n"); but contrary to previously there is no 'stdint.h' and they are also not unconditionally included. (The 'stdbool.h' is only used for a single 'true', but on the other hand it is only #included under this condition and 'stdbool.h' is a very simple file.) I intent to apply this patch as obvious, unless there are further comments. * * * Thomas Schwinge wrote: I've not verified, but I very much suspect that this change: […] gcn/mkoffload.cc: Use #embed for including the generated ELF file ... is responsible for: […] /tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 'getenv' [-Wimplicit-function-declaration] […] Did you not see that happen in your testing? I vaguely remember some fails in this area — but after digging and re-testing, it did not show up, for whatever reason. As it only triggers with -mstack-size, it somehow must have fallen through the cracks. :-/
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
Hi Andre, Andre Vehreschild wrote: Could you also please specify the commit SHA your patch is supposed to apply to? At current mainline's HEAD it has several rejects which makes reviewing harder. I just tried and here it applies cleanly on mainline, except that I get a bunch of: Hunk #1 succeeded at 2904 (offset 74 lines). style of warning, but those hunks still seem to end up at the proper play. And please attach the patch as plain text. It is html-encoded with several html-codes, for example a '>' is encoded as '>'. This makes it nearly impossible to apply. I don't see this in my email program – and also when looking at https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663534.html – I don't see any '>' – also not when looking at the the HTML attachment. please check the code style of your patch using: contrib/check_GNU_style.py It reports several errors with line length and formatting. Hmm, I only see errors related to tree dump, which seem to be okay: === ERROR type #1: there should be exactly one space between function name and parenthesis (7 error(s)) === gcc/fortran/dump-parse-tree.cc:2915:17: fputs (" LOCAL(", dumpfile); And the following is in the parser – and the spaces are mandatory here: === ERROR type #2: there should be no space before closing parenthesis (1 error(s)) === gcc/fortran/match.cc:2758:41: else if (gfc_match ("default ( none )") == MATCH_YES) I wonder what's the difference between our email readers. – Can you try the version from the mailing list archive? Cheers, Tobias
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
Hi Paul, Am 23.09.24 um 10:26 schrieb Paul Richard Thomas: In addition to Andre's remarks, could you please tell us, when you resubmit, if this is a complete F2023 implementation of do concurrent. If not, what is missing? Regarding missing parts: still to do is actually privatizing (with or without initialization) for variables that are listed with 'local' and 'local_init'. Hence, code doing that currently fails after doing all required diagnostic with a 'sorry not yet implemented error'. [My feeling is that doing it in trans*.cc might make most sense, but it could be also done by adding at Fortran AST level (inserting a BLOCK + adding the variable there).] Otherwise, all parsing + diagnostic should work; 'default(none)' is diagnostics only and 'shared' doesn't do anything, except affecting 'default(none)' diagnostic. — 'reduce' will have a code gen effect, but only when going to real concurrency/parallel execution. * * * If you talk about unimplemented 'do concurrent' features in general, gfortran does not handle the forall/do-concurrent header with typespec (i.e. 'do concurrent (integer :: i = 1, 4)', cf. https://gcc.gnu.org/PR96255 [F2018 feature]. * * * In terms of true parallelization: * I was (since a while) thinking of having a -fdo-concurrent= compile-time flag to handle this. * OpenMP 6.0 (added I think in Technical Report (TR) 13, which was released Aug 1, 2024) now supports '!$omp loop' on 'do concurrent' Either variant would then use the new locality spec (F2018/F2023 and new in gfortran) and hook into the existing OpenMP/OpenACC handling. – '!$omp loop' and -fdo-concurrent=omp-parallel are in any case easier than 'omp-target-parallel' as the latter will run into issues related to data mapping or (potentially) atomic updates now having to be in sync with host atomic access. BTW Thanks for doing this. It was on my long term TODO list and is now struck off :-) Yes – and I have heard from others that do-concurrent actually being concurrent – or at least having having the new locality specs even if not run concurrently is a much missed feature. — That might be from a small bubble, but still those users wand to have it. And also Damian mentioned that he has a project what will use it. Also thanks from my side! Tobias
Re: [PATCH] Fortran: Added support for locality specs in DO CONCURRENT (Fortran 2018/23)
Hi all, as a background – Anuj, did this as part of his Google Summer of Code project (thanks!). As I looked as various drafts, I would be happy if someone else could have a look as well, as I probably start skipping over things and, hence, as miss potential issues … A bit hidden in the patch is a bug fix to allow 'concurrent' as loop variable name of a normal 'do' loop … Thanks, Tobias Anuj Mohite wrote: gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_code_node): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. Added support for dumping DO CONCURRENT locality specifiers. * frontend-passes.cc (index_interchange, gfc_code_walker): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. * gfortran.h (enum locality_type): Added new enum for locality types in DO CONCURRENT constructs. * match.cc (match_simple_forall, gfc_match_forall): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. (gfc_match_do): Implemented support for matching DO CONCURRENT locality specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE). * parse.cc (parse_do_block): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. * resolve.cc: Added struct check_default_none_data. (do_concur_locality_specs_f2023): New function to check compliance with F2023's C1133 constraint for DO CONCURRENT. (check_default_none_expr): New function to check DEFAULT(NONE) compliance. (resolve_locality_spec): New function to resolve locality specs. (gfc_count_forall_iterators): Updated to use code->ext.concur.forall_iterator. (gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator. * st.cc (gfc_free_statement): Updated to free locality specifications and use p->ext.concur.forall_iterator. * trans-stmt.cc (gfc_trans_forall_1): Updated to use code->ext.concur.forall_iterator. gcc/testsuite/ChangeLog: * gfortran.dg/do_concurrent_10.f90: New test for parsing DO CONCURRENT with 'concurrent' as a variable name. * gfortran.dg/do_concurrent_8_f2018.f90: New test for F2018 DO CONCURRENT with nested loops and REDUCE clauses. * gfortran.dg/do_concurrent_8_f2023.f90: New test for F2023 DO CONCURRENT with nested loops and REDUCE clauses. * gfortran.dg/do_concurrent_9.f90: New test for DO CONCURRENT with DEFAULT(NONE) and locality specs. * gfortran.dg/do_concurrent_all_clauses.f90: New test covering all DO CONCURRENT clauses and their interactions. * gfortran.dg/do_concurrent_basic.f90: New basic test for DO CONCURRENT functionality. * gfortran.dg/do_concurrent_constraints.f90: New test for constraints on DO CONCURRENT locality specs. * gfortran.dg/do_concurrent_local_init.f90: New test for LOCAL_INIT clause in DO CONCURRENT. * gfortran.dg/do_concurrent_locality_specs.f90: New test for DO CONCURRENT with locality specs. * gfortran.dg/do_concurrent_multiple_reduce.f90: New test for multiple REDUCE clauses in DO CONCURRENT. * gfortran.dg/do_concurrent_nested.f90: New test for nested DO CONCURRENT loops. * gfortran.dg/do_concurrent_parser.f90: New test for DO CONCURRENT parser error handling. * gfortran.dg/do_concurrent_reduce_max.f90: New test for REDUCE with MAX operation in DO CONCURRENT. * gfortran.dg/do_concurrent_reduce_sum.f90: New test for REDUCE with sum operation in DO CONCURRENT. * gfortran.dg/do_concurrent_shared.f90: New test for SHARED clause in DO CONCURRENT. Signed-off-by: Anuj --- gcc/fortran/dump-parse-tree.cc| 113 +- gcc/fortran/frontend-passes.cc| 8 +- gcc/fortran/gfortran.h| 20 +- gcc/fortran/match.cc | 286 +- gcc/fortran/parse.cc | 2 +- gcc/fortran/resolve.cc| 354 +- gcc/fortran/st.cc | 5 +- gcc/fortran/trans-stmt.cc | 6 +- .../gfortran.dg/do_concurrent_10.f90 | 11 + .../gfortran.dg/do_concurrent_8_f2018.f90 | 19 + .../gfortran.dg/do_concurrent_8_f2023.f90 | 23 ++ gcc/testsuite/gfortran.dg/do_concurrent_9.f90 | 15 + .../gfortran.dg/do_concurrent_all_clauses.f90 | 26 ++ .../gfortran.dg/do_concurrent_basic.f90 | 11 + .../gfortran.dg/do_concurrent_constraints.f90 | 126 +++ .../gfortran.dg/do_concurrent_local_init.f90 | 11 + .../do_concurrent_locality_specs.f90 | 14 + .../do_concurrent_multiple_reduce.f90 | 17 + .../gfortran.dg/do_concurrent_nested.f90
Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector
On Sunday, September 22, 2024, Sandra Loosemore wrote: > […] I think the predicate of the more general case for > > target_device={device_num (NUM), kind(KIND), arch(ARCH), isa(ISA)} > > can be expressed (using GCC statement expression syntax) as > > ({ >int matches; >#pragma omp target device (NUM) > matches = magic_cookie (KIND, ARCH, ISA) >matches; > }) > > where magic_cookie is either a built-in or new gimple code. I think the gimplifier is probably the right place to do the above transformation, and the magic_cookie expansion would happen during (or at least at the same point in compilation as) late metadirective resolution; IOW, in the offload compiler). That part can call targetm.omp.device_kind_arch_isa to resolve the whole works into a constant true/false, similar to how the "device" selector is handled in the offload compiler, rather than into any runtime routine. I think that can work. I was (and am to a much lesser extent) worrying a bit about the overhead the target call, but as the spec only has one (default or the one specified) that should be fine. (One can think of merging multiple target regions for multiple candidates or moving them out of a hot loop.) And for uid(xxx) it still needs a runtime call, but then calling __builtin_strcmp(xxx, omp_get_uid_from_device(...)) should be fine. There is the larger question whether we should report the compile time supported isa or the real one, but I think either works. And whether to regard the isa as feature set, which newer systems also support (done for x86(_64)) or as strictly that specific version (as done for nvptx), but that's independent of the way we implement it. > Does this seem like a plausible way to continue? At a glace, yes. Tobias
[Patch] OpenMP: Add support for 'self_maps' to the 'require' directive
Add support of the 'self_maps' clause in 'omp requires', an OpenMP 6 feature but added here mostly as part of the on-going improvement of the unified-shared memory (USM) handling. Comments, remarks concerns before I commit it? * * * Regarding USM, there is on one hand the hardware: - some hardware cannot access the host memory at all - other hardware can access it, but either only through an interconnect or via page migration on page fault - on the third time of hardware, a host and device share the same memory controller For the latter, a 'map' never does make sense, but for the second case, it depends on the details whether it is better to do mapping or directly accessing the memory (i.e. via interconnect or page migration). On the compile-time side, the user can demand: - no requirement - 'requires unified_shared_memory' (= memory has to be accessible but the implementation can still do mapping for explicit maps) - 'requires shared_memory' - mapping is strictly not permitted. - other hints using compiler flags And for the runtime, the result depends on the actual hardware, the compile-time wishes, environment variables what is done. * * * Currently, the runtime never maps with USM, i.e. both act the same. At least using an environment variable, I would consider enabling mapping - one could also consider to have it always do mappings, except for self_maps. On the compile side, we need to handle implicit 'declare target' better - as it currently leads to separate memory. Using 'link', we could point to the host memory (at least for 'self_maps'). And before we can enable USM by default for integrated/APU devices, we need to solve some issues with 'link' (→ posted link) and for those, 'map' has to be honored. Those are 5.x follow up tasks, but having 'self_maps' available, completes the what-does-the-user-want part. Tobias PS: There is also the 'self' modifier to the map clause, working on a per-variable granularity. However, this like several other 6.0 items is completely out of scope of the current USM work. PPS: See also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and the patch associated set, posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html OpenMP: Add support for 'self_maps' to the 'require' directive 'self_maps' implies 'unified_shared_memory', except that the latter also permits that explicit maps copy data to device memory while self_maps does not. In GCC, currently, both are handled identical. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Handle self_maps clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Handle self_maps clause. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS. (gfc_namespace): Enlarge omp_requires bitfield. * module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS. (mio_symbol_attribute): Handle it. * openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle self_maps clause. * parse.cc (gfc_parse_file): Handle self_maps clause. gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle self_maps clause. * omp-general.cc (struct omp_ts_info, omp_context_selector_matches): Likewise for the associated trait. * omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_IMPLEMENTATION_SELF_MAPS. include/ChangeLog: * gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept self_maps clause. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * libgomp.texi (TR13 Impl. Status): Set to 'Y'. * target.c (gomp_requires_to_name, GOMP_offload_register_ver, gomp_target_init): Handle self_maps clause. * testsuite/libgomp.fortran/self_maps.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/declare-variant-1.c: Add self_maps test. * c-c++-common/gomp/requires-4.c: Likewise. * gfortran.dg/gomp/declare-variant-3.f90: Likewise. * c-c++-common/gomp/requires-2.c: Update dg-error msg. * gfortran.dg/gomp/requires-2.f90: Likewie. gcc/c/c-parser.cc | 3 ++ gcc/cp/parser.cc | 3 ++ gcc/fortran/gfortran.h | 10 +++-- gcc/fortran/module.cc | 11 - gcc/fortran/openmp.cc | 30 - gcc/fortran/parse.cc | 3 ++ gcc/lto-cgraph.cc | 4 ++ gcc/omp-general.cc | 21 ++ gcc/omp-general.h | 1 + gcc/omp-selectors.h| 1 + .../c-c++-common/gomp/declare-variant-1.c | 6 +++ gcc/testsuite/c-c++-common/gomp/requires-2.c | 2 +- gcc/testsuite/c-c++-common/gomp/requires-4.c | 1 + .../gfort
OpenMP: Fix omp_get_device_from_uid, minor cleanup (was: Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines)
Hi Thomas, hello all, the attached follow-up patch does: * It fixes an issue (thinko) related to Fortran and \0 terminated, which fails for at least substring strings. * Includes some minor fixes, e.g. ensuring the device is initialized in omp_get_uid_from_device, the superfluous 'omp_', or adding some inits to oacc-host.c. * Now the plugins return NULL instead of failing when the UID cannot be obtained; in that case, the fallback UID "OMP_DEV_%d" is used. Comments or remarks before I commit it? * * * Regarding the topic of caching in the plugin instead of in libgomp: If we want to change it, we either to remove the fallback and require the existence and success of GOMP_OFFLOAD_get_uid. Otherwise, with host fallback support, we have to cache it at both locations, which is somehow not really sensible, either. Thoughts on this topic? * * * Longer reply to Thomas' comments: Thomas Schwinge wrote: + "omp_get_uid_from_device", ..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly sorted). Ups! Should be of course without. (as 'omp_' prefix is checked before). Do we apparently not have test suite coverage for these things? We do *not* test all API routines. The check is, e.g., used in gfc_error ("%s cannot contain OpenMP API call in intervening code " or "OpenMP runtime API call %qD in a region with " "% clause", fndecl); And we have a few tests for each of them, but not a full set of all API routines. * * * + const char *uid; Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for each call, is a minor performance optimization? (Similar to other items cached here, I guess.) Yes, but it goes a bit beyond: As the pointer is returned to the user, it has to be allocated at some point - and cached to avoid allocating more memory when called repeatable called. As the fallback and host handling is also done in target.c, the caching is done here. (Besides the API routines, two env vars and one context selector for 'target_device' support the UID.) * * * Please also update 'libgomp/oacc-host.c:host_dispatch'. Done. + ! Note: In gfortran, strings are \0 termined + integer(c_int) function omp_get_device_from_uid(uid) bind(C) For my understanding: in general, Fortran strings are *not* NUL-terminated, right? So this is a specific properly of 'gfortran' and/or this GCC/OpenMP interface, The Fortran standard leaves this to implementation, but by construction, there is a length (however it is made handled internally, e.g. via the declaration) and the actual data. - To aid debugging, gfortran NUL terminates them. However, when thinking a bit more about it, taking a substring of a null-terminated string will not magically be \0 at the boundary of the substring. - Thus, the simplified approach failed + a Fortran specific function had to be added (→ fortran.c). * * * +interface omp_get_uid_from_device + ! Deviation from OpenMP 6.0: VALUE added. (..., which I suppose you've reported to OpenMP...) No - it is not really a bug in the standard. The OpenMP specification tries to provide a consistent API - but it is difficult to create an API without touching the ABI. For the caller side, the usage is the same independent whether there is an 'intent(in)' or VALUE attribute, a Bind(C) with or without binding name. Or also a generic interface with multiple specific ones - which we do to handle -fdefault-integer-8. Obviously, the compiler needs to know those details, but unless users codes the interface themselves instead of using omp.h / omp_lib.h / the omp_lib module. Thus, that's one of the few deviation from the OpenMP specification which does affect the ABI but not the API. * * * +GOMP_OFFLOAD_get_uid (int ord) +{ I guess I'd have just put this code into 'init_hsa_context', filling a new statically-sized 'uuid' field in 'hsa_context_info' (like 'driver_version_s'; and assuming that 'hsa_context_info' is the right abstraction for this), and then just return that 'uuid' from 'GOMP_OFFLOAD_get_uid'. That would be one option. Still, we have to decide whether we either want to have strictly everything handled in the device code - including fallback handling (which could be an UID replacement or a fatal error). Of we do part of the handling elsewhere, e.g. by permitting that the plugin can fail or does not provide the functions, we can handle it in target.c (as currently done) - but then we need to cache it there as well (or at least the fallbacks). * * * That way, you'd avoid the unclear semantics of who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon 'GOMP_OFFLOAD_fini_device' -- currently the memory is lost? Well, depends what you mean by lost. The 'devices' data structure in target.c is allocated early during device initialization and it is never deallocated. Hence, also the current "uint" member is never deallocated and remains until the end of the program acc
[Patch] gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h (was: [Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file)
Hi Thomas, See attached patch for adding the include lines: + if (gcn_stack_size) +{ + fprintf (cfile, "#include \n"); + fprintf (cfile, "#include \n\n"); but contrary to previously there is no 'stdint.h' and they are also not unconditionally included. (The 'stdbool.h' is only used for a single 'true', but on the other hand it is only #included under this condition and 'stdbool.h' is a very simple file.) I intent to apply this patch as obvious, unless there are further comments. * * * Thomas Schwinge wrote: I've not verified, but I very much suspect that this change: […] gcn/mkoffload.cc: Use #embed for including the generated ELF file ... is responsible for: […] /tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 'getenv' [-Wimplicit-function-declaration] […] Did you not see that happen in your testing? I vaguely remember some fails in this area — but after digging and re-testing, it did not show up, for whatever reason. As it only triggers with -mstack-size, it somehow must have fallen through the cracks. :-/ Tobias gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed was added and no longer required fprintf '#include' removed, missing somehow that with -mstack-size=, the generated configure_stack_size will use 'setenv' and 'true'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 1f6337719e9..1a524ced653 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -613,6 +613,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile) struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *); struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *); + if (gcn_stack_size) +{ + fprintf (cfile, "#include \n"); + fprintf (cfile, "#include \n\n"); +} + fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count); fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
[wwwdocs][Patch] gcc-15: mention wider offloading arch combination support (e.g. aarch64 + nvptx)
This is supposed to document that GCC now supports offloading, e.g., from an ARM CPU to a Nvidia GPU (i.e. Grace<->Hopper) or, e.g., x86-64 to RISC-V. → https://gcc.gnu.org/PR96265 and https://gcc.gnu.org/PR111937 for the associated PRs. I think it is important enough to get it into the release notes. However, I am not sure about the wording. Thoughts or suggestions? Tobias gcc-15: mention wider offloading arch support (e.g. aarch64 + nvptx) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 7c372688..e923ede4 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -36,6 +36,14 @@ a work-in-progress. General Improvements + + +For offloading, issues preventing some host-device architecture +combinations have been resolved. In particular, offloading from an +aarch64 host to a nvptx device is now supported. + + + New Languages and Language specific improvements
[wwwdocs][Patch] gcc-15: Update OpenMP section for constr/destr on devices + UID routines
A minor update for a bug fix / impl.-quality feature and a proper new feature. Any comments before I apply it? Tobias gcc-15: Update OpenMP section for constr/destr on devices + UID routines diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 7c372688..14514131 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -55,11 +55,17 @@ a work-in-progress. GPUs, writing to the terminal from OpenMP target regions (but not from OpenACC compute regions) is now also supported in Fortran; in C/C++ and on AMD GPUs this was already supported before with both OpenMP and OpenACC. + Constructors and destructors on the device side for declare target + static aggregates are now handled. OpenMP 5.1: The unroll and tile loop-transformation constructs are now supported. + + OpenMP 6.0: The get_device_from_uid and + omp_get_uid_from_device API routines have been added. +
[wwwdocs][Patch] gcc-15: Fortran - mention -funsigned + PowerPC Darwin IEEE module support
Hi all, I thought it makes sense to have a look at what went into GCC 15 to update the Fortran section. However, while several bugs were fixed (and extended some features a tiny bit) [hooray!], I did not really see many newsworthy features. Comments, remarks to, approval of the attached wwwdocs patch? Tobias PS: Anuj, the GSoC student, nearly finished his do-concurrent patch, which will add the local/local_init/shared/default(none) of F2018 and the reduce of F2023. Still no fancy parallelization, but the first step + useful as it will permit compiling such code and it does works as serially run code. gcc-15: Fortran - mention -funsigned + PowerPC Darwin IEEE module support diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 7c372688..3a275d8c 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -111,6 +111,12 @@ a work-in-progress. Fortran 2023: The selected_logical_kind intrinsic function and, in the ISO_FORTRAN_ENV module, the named constants logical{8,16,32,64} and real16 were added. + Experimental support for unsigned integers; enabled by the + -funsigned, see https://gcc.gnu.org/onlinedocs/gfortran/Experimental-features-for-Fortran-202Y.html"; + >gfortran documentation for details. This feature has been proposed + (https://j3-fortran.org/doc/year/24/24-116.txt";>J3/24-116) + for inclusion in the next Fortran standard. @@ -214,6 +220,11 @@ a work-in-progress. +PowerPC Darwin + + Fortran's IEEE modules are now suppored on Darwin PowerPC. + +
Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Now applied as r15-3730-gbf4a5efa80ef84 / https://gcc.gnu.org/r15-3730-gbf4a5efa80ef84 (with a few minor tailing whitespace/indentation issues fixed). Post-commit comments are still highly welcome. By tomorrow, you will find the documentation at https://gcc.gnu.org/onlinedocs/libgomp/ (routine + nvptx/gcn offload specific) which makes it easier to read. Thanks, Tobias Tobias Burnus wrote: Minor update – addressing the issues that Andre raised (thanks!): 'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise libgomp.texi changes, only: A bunch of typo fixes (preexisting and in the new text). I also added an made-up example UUID for the GPUs, which should help to reduce confusion. Any additional comments or suggestions? Tobias Tobias Burnus wrote: in order to know and potentially re-use a specific offload device (reproducibility, affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique identifier and the OpenMP device number is useful. Thus, TR13 added support for it. This is a collateral patch caused by looking at the API routines for other reasons and looking at that part of the spec during the OpenMP F2F. Besides the added API routines, the UID will be used elsewhere: * In context selectors: 'target_device' supports 'uid()'. * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. @Sandra: Besides the usual .texi part, for the 'target_device' trait set: if you add a new GOMP routine for kind/arch/isa - can you also add an UID argument such that we don't have to update the API when needing in the not so far future. @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + .texi)? @Jakub or anyone else — any comments, suggestions, remarks? [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU and seems to work fine.]
[Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Minor update – addressing the issues that Andre raised (thanks!): 'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise libgomp.texi changes, only: A bunch of typo fixes (preexisting and in the new text). I also added an made-up example UUID for the GPUs, which should help to reduce confusion. Any additional comments or suggestions? Tobias Tobias Burnus wrote: in order to know and potentially re-use a specific offload device (reproducibility, affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique identifier and the OpenMP device number is useful. Thus, TR13 added support for it. This is a collateral patch caused by looking at the API routines for other reasons and looking at that part of the spec during the OpenMP F2F. Besides the added API routines, the UID will be used elsewhere: * In context selectors: 'target_device' supports 'uid()'. * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. @Sandra: Besides the usual .texi part, for the 'target_device' trait set: if you add a new GOMP routine for kind/arch/isa - can you also add an UID argument such that we don't have to update the API when needing in the not so far future. @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + .texi)? @Jakub or anyone else — any comments, suggestions, remarks? [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU and seems to work fine.]OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): New functions. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test. gcc/omp-general.cc | 4 +- include/cuda/cuda.h | 7 ++ libgomp/config/gcn/target.c | 14 libgomp/config/nvptx/target.c| 14 libgomp/fortran.c| 15 libgomp/libgomp-plugin.h | 1 + libgomp/libgomp.h| 2 + libgomp/libgomp.map | 8 +++ libgomp/libgomp.texi | 89 ++-- libgomp/omp.h.in | 3 + libgomp/omp_lib.f90.in | 23 ++ libgomp/omp_lib.h.in | 23 ++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-gcn.c | 16 + libgomp/plugin/plugin-nvptx.c| 34 + libgomp/target.c | 56 +++ libgomp/testsuite/libgomp.c/device_uid.c | 38 ++ libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 +++ 18 files changed, 384 insertions(+), 7 deletions(-) diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index de91ba8a4a7..12788ad0249 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name) "alloc", "calloc",
Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Hi Andre, thanks for reading the patch + commenting. Andre Vehreschild wrote: in the changelog of libgomp: * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): Add. "Add." what? Can you be more specific, i.e. is it just a dummy or prototype? Neither. It is a full implementation (that is a wrapper to the target.c function, directly called by C/C++). The prototype used by fortran.c is 'omp.h.in' (i.e. the C/C++ header file, also used by user code) and for Fortran code of users, it is the module generated from 'omp_lib.f90.in' and the (deprecated) include file 'omp_lib.h.in'. The purpose of fortran.c in general – and also for the added code – is to be a wrapper between the Fortran API/ABI and the C ABI. In the current case, there are two reasons for the two functions: (a) The result type is 'character(:), pointer' – but the C function just returns a '\0' terminated const char*. Hence, the wrapper function contains a '*result_len = strlen (*result);' besides the '*result = ' (b) The argument is an 'integer'. As we want to be compatible with -fdefault-integer-8, previously somewhat fashionable, we have an 'int32_t' and an 'int64_t' version of the function – which needs a second wrapper function. As for the other API routine, as a BIND(C) makes it call the C function, no wrapper it needed. * * * [Typo: missing 'a' – noted + will fix.] * * * +@item The unique identifier (UID), used with OpenMP's API UID routine, consists + of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by + the CUDA runtime library. This UUID is output in grouped lower-case + hex digits; the grouping of those 32 digits is: 8 digits, hyphen, + 4 digits, hyphen, 4 digits, hyphen, 16 digits. The output matches the + format used by @code{nvidia-smi}. @end itemize Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then why is the "normal" UUID display format described here? This confuses me. (Just curiosity.) For AMD, it is the following type of string, which contains a 8 bytes/16 hex-digits UUID part: 'GPU-abcef0123456789'. While for Nvidia it is 'GPU-abcdef12-1234-1234-01234567890abcd', consisting of a 16 bytes/32 hex-digits UUID. For AMD, we directly get the string, matching what "rocminfo" shows as UUID. For Nvidia, we don't get a string but a 'char bytes[16]' array filled with the values, which we print each as '%02x' hex digit. For the output, additionally, a "GPU-" prefix is added + a few hyphens. That's to mimic what 'nvidia-smi -a' outputs. I admit it is slightly confusing – and when reading the .texi, it is also easy to miss that one part talks about AMD ("GCN") GPUs and the other about NVidia GPUs. → https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html (In terms of OpenMP, it is only a unique identifier; it does not need to be universally unique [and also isn't for the host]; AMD and Nvidia call it UUID and it looks rather unique for the GPU; rocminfo also outputs an "UUID" for the CPU but that's just "CPU-XX" (twice for a dual socket system, i.e. not even unique), but we don't use this output.) Er, and when I read further on, I find the nvptx implementation and that contradicts the description. There a "normal" UUID is added to the GPU- id. Now I am confused. What description contradicts which one? Tobias
[Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Hi all, in order to know and potentially re-use a specific offload device (reproducibility, affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique identifier and the OpenMP device number is useful. Thus, TR13 added support for it. This is a collateral patch caused by looking at the API routines for other reasons and looking at that part of the spec during the OpenMP F2F. Besides the added API routines, the UID will be used elsewhere: * In context selectors: 'target_device' supports 'uid()'. * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. @Sandra: Besides the usual .texi part, for the 'target_device' trait set: if you add a new GOMP routine for kind/arch/isa - can you also add an UID argument such that we don't have to update the API when needing in the not so far future. @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + .texi)? @Jakub or anyone else — any comments, suggestions, remarks? [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU and seems to work fine.] Tobias OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): Add. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test. gcc/omp-general.cc | 4 +- include/cuda/cuda.h | 7 ++ libgomp/config/gcn/target.c | 14 libgomp/config/nvptx/target.c| 14 libgomp/fortran.c| 15 + libgomp/libgomp-plugin.h | 1 + libgomp/libgomp.h| 2 + libgomp/libgomp.map | 8 +++ libgomp/libgomp.texi | 81 +++- libgomp/omp.h.in | 3 + libgomp/omp_lib.f90.in | 23 +++ libgomp/omp_lib.h.in | 23 +++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-gcn.c | 16 + libgomp/plugin/plugin-nvptx.c| 34 ++ libgomp/target.c | 56 libgomp/testsuite/libgomp.c/device_uid.c | 38 +++ libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 18 files changed, 379 insertions(+), 4 deletions(-) diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index de91ba8a4a7..12788ad0249 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name) "alloc", "calloc", "free", + "get_device_from_uid", "get_interop_int", "get_interop_ptr", "get_mapped_ptr", @@ -3338,12 +3339,13 @@ omp_runtime_api_procname (const char *name) as DECL_NAME only omp_* and omp_*_8 appear. */ "display_env", "get_ancestor_thread_num", - "init_allocator", + "omp_get_uid_from_device", "get_partition_place_nums", "get_place_num_procs", "get_place_proc_ids", "get_schedule", "get_team_size", + "init_allocator",
Re: [PATCH v2 1/8] libgomp: Disentangle shared memory from managed
Hi Andrew, → https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655947.html On June 28, 2024, Andrew Stubbs wrote: Some GPU compute systems allow the GPU to access host memory without much prior setup, but that's not necessarily the fast way to do it. For shared memory APUs this is almost certainly the correct choice, but for AMD there is the difference between "fine-grained" and "coarse-grained" memory, and for NVidia Cuda generally runs better if it knows the status of the memory you access. In my understanding, the migration on page fail USM implementation works rather well in general, both with AMD and Nvidia GPUs. It obviously has issues, e.g. when the same page is accessed frequently (semi-)concurrently by host and the device as it then keeps migrating forth and back. Or when you access a large array - where mapping it in one go - is faster than keeping hitting the page boundary. And if the data is handled through an interconnect like with NVlink on PowerPC Volta, there is a long latency. The issue with the page migration forth and back can be solved by placing it in a pinned memory (best in one provided by the GPU runtime). And the page-boundary issue can be fixed by using large pages for large data. Therefore, I think for USM, switching to no mapping by default is not a bad idea. (However, see env var idea below.) * * * Not a review, but some first comments (glancing also at my local WIP patch + my personal to-do list): * I need to finish my patch that still does mapping with 'declare target enter(…)' variables - otherwise, automatically turning on GOMP_OFFLOAD_CAP_SHARED_MEM will give tons of fails as those systems. (Generic issue: It should also be fixed for 'requires unified_shared_memory', but typical smaller USM code is less likely to thit this issue.) For 'declare target link', I have already posted a patch, but that has still to be committed. * I think having a per-device property would be useful. In principle, it would be nice that - when two GPUs exist on a system but only one has shared-memory support - that USM GPU would be selected with 'requires unified_shared_memory'. Currently, all GPUs are then excluded. Example: Richi's gfx1030 and gfx1036, where only gfx1036 supports USM. Currently,|HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED is false when both of his GPUs are enabled as that's a system property and not a per-device property. (For nvidia, we have 'CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS'; for this feature, we had to exclude them in the count in |GOMP_OFFLOAD_get_num_devices but also in the later GOMP_OFFLOAD_init_device they need to be skipped.) * For AMD APUs, I wonder whether we chould use instead the following: hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES, &memory_properties) plus hsa_flag_isset64 (memory_properties, HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU)(this needs a newer 'hsa_ext_amd.h' than included in GCC or additional defs in our copy of the .h file). [I don't know in which ROCm version this feature got added.] Talking about this API, I wonder whether we also want to use: HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU (returning an uint32_t) I do see it in our hsa_ext_amd.h but it is not used elsewhere. In our documentation, we claim, https://gcc.gnu.org/onlinedocs/libgomp/AMD-Radeon.html "The hardware permits maximally 40 workgroups/CU" but if I run that check on gfx90a, I get '32' and not '40' as result. (On gfx908 it is 40.) On the other hand, 40 only shows up in the comment to parse_target_attributes - while 32 is used in gcn_exec. * Regarding USM vs. MAPPING: OpenMP leaves it open whether with 'requires unified_shared_memory', explicit mapping is honored or not. — With (OpenMP 6.0's) 'requires self_maps', no mapping is permitted. Your patch changes the current handling: With non-APUs, it will not set the GOMP_OFFLOAD_CAP_SHARED_MEM for 'requires unified_shared_memory', but that will cause that 'map' clauses are not ignored. I think that's okay(ish) at least for explicit map clauses, but I am not sure whether it is for implicit maps. In any case, if we do so, we probably must update omp_target_is_accessible - otherwise we return false even when USM has been required, if the capabilities do not include GOMP_OFFLOAD_CAP_SHARED_MEM. In any case, similar to CRAY, I think it makes sense to have an env var to toggle between mapping vs. not mapping for USM-supporting devices that aren't GPUs. * Compiler side: - I think we need to turn all auto 'declare target enter(...)' variable to 'link' when 'requires unified_shared_memory' is used. (At least that's how I read the TR13/6.0 spec.) - For 'self_map', we have to do likewise for all declare-target variables. - I was thinking of adding a commandline flag to force-change 'enter' to 'link' - applicable to both 'usm' and no requires line. (For 'self_map' it would be always on and for 'usm' it would still be on for automatic 'declare target
Re: libgomp: with USM, init 'link' variables with host address
Hi Thomas, short version: I think the patch as posted is fine and no action beyond is needed for this one issue. See below for the long version. Possibly modifications (now or as follow up): - using memcpy + or let the plugin do it - not adding link variables to the splay tree with 'USM'. Thomas Schwinge wrote: Tested on x86-64-gnu-linux and nvptx offloading (that supports USM). (I yet have to set up such a USM configuration...) You already used an USM config, e.g., when running gfx90a (likewise: gfx90c), except that USM on mainline it currently only works if you explicitly set 'export HSA_XNACK=1'. For Nvptx, you need a post-Volta GPU with the open-kernels driver, which is for newer driver versions the default. * * * Do I understand correctly that even if 'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the 'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not (yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'? We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires unified_shared_memory'. But, indeed, we cannot skip the memory mapping parts – due to the way we handle static variables. * * * + + if (is_link_var + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start, + &k->host_start, sizeof (void *), false, NULL); } Calling 'gomp_copy_host2dev' looks a bit funny given we've just determined USM (..., but I'm not asking for plain 'memcpy'). I guess a plain memcpy would do as well. [Assuming that the device's static variable is host accessible, which it probably is and should be.] I add it to my to-do list for USM-related tasks to change this; possibly moving it to the plugin side has some advantages? Possibly not adding it to the splay tree if not needed. (Cf. below for env var discussion.) Regarding the unload: For 'declare target link(A)', we have, e.g., 'static int *A' on the device side. Thus, we could do 'A = NULL' – and rather should do 'A = {clobber}', but that's rather pointless in general and especially when unloading the image. What's the advantage/rationale of doing this here vs. in 'gomp_map_vars_internal' for 'REFCOUNT_LINK'? (May be worth a source code comment?) (A, B, C refers to the following example.) We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in gomp_map_vars_internal. Besides: We only want to do the initialization once and not every time gomp_map_vars_internal is called. I think the following program may help to understand the issue and the patch better. Note: While A, B, C are 'int …[3]' on the host, on the device we only have 'int B[3]' while for A it's 'int *A' and C only exists on the host. * * * #pragma requires unified_shared_memory static int A[3], B[3], C[3]; #pragma omp declare target link(A) enter(B) #pragma omp begin declare target void f(int *p) { A[2] += B[2] + p[2]; // p points to the host's C variable } #pragma omp end declare target void foo(int dev) { int *ptr = C; #pragma omp target firstprivate(ptr) device(dev) f (ptr); } * * * Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before the target region and inside the target region. 'B' points to the device local version of the variable. And 'A' on a non-host device is likely to be NULL ('static int *A' + .BSS) before this patch. Or pointing to the host's 'A' with this patch. * * * With A pointing to the host version (and likewise 'p' pointing to the host C), host fallback and device version yield identical result for 'A' and for 'C' (via ptr/p). — However, 'B' on host and non-host device have nothing in common. While that might be fine, in general it is not. Hence, in order to get for a .BSS valued 'B' the same result on host and device, we need, e.g. #pragma omp data map(always: B) device(dev) foo (dev); to call 'foo' to ensure that the two 'B' are in sync. * * * Code wise, this means that with GOMP_OFFLOAD_CAP_SHARED_MEM, we still have to apply the map for 'declare target enter(…)' variables, except if host and device share the same code – but that should only be the case for host fallback (= initial device) and, possibly, GOMP_OFFLOAD_CAP_NATIVE_EXEC. * * * NOTE: OpenMP still permits to honor explicit 'map' with 'requires unified_shared_memory', only with 'self' maps, copying the data in 'map' is explicitly disallowed. * * * This patch + honoring 'map' for static (non-'link'?) variables even with GOMP_OFFLOAD_CAP_SHARED_MEM where the main items for the USM follow-up patches, I meant by "More USM cleanup/fixes/extensions to make it _more_ useful" on slide 16 of https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.openmp_openacc_and_offloading_in_gcc Plus, to go a bit beyond: - offering a flag to change 'declare target enter(…)' to 'link(…)' [RFC: enable it by default for 'requires unified_shared_memory'?] - switching to
libgomp: with USM, init 'link' variables with host address
The idea of link variables is to replace he full device variable by a pointer, permitting to map only parts of the variable to the device, saving memory. However, having a pointer permits for (unified) shared memory to point to the host variable. That's what this patch does: instead of having a dangling pointer, upon loading the image, the device side pointers are updated to point to the host. With the current patch, this is only done when explicitly requesting unified-shared memory. Tested on x86-64-gnu-linux and nvptx offloading (that supports USM). Remarks/comments/suggestions before I commit it? Tobias PS: I intent to do some additional changes for improved USM handling. Once done, I intent to look into (a) given the user a bit more power on mapping vs. not mapping and (b) to use for APUs by default USM, even without 'requires unified_shared_memory'. libgomp: with USM, init 'link' variables with host address If requires unified_shared_memory is set, make 'declare target link' variables to point initially to the host pointer. libgomp/ChangeLog: * target.c (gomp_load_image_to_device): For requires unified_shared_memory, update 'link' vars to point to the host var. * testsuite/libgomp.c-c++-common/target-link-3.c: New test. libgomp/target.c | 5 +++ .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 ++ 2 files changed, 57 insertions(+) diff --git a/libgomp/target.c b/libgomp/target.c index 47ec36928a6..66b54fd2ab8 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, array->right = NULL; splay_tree_insert (&devicep->mem_map, array); array++; + + if (is_link_var + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start, + &k->host_start, sizeof (void *), false, NULL); } /* Last entry is for the ICV struct variable; if absent, start = end = 0. */ diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c new file mode 100644 index 000..c707b38b7d4 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c @@ -0,0 +1,52 @@ +/* { dg-do run } */ + +#include +#include + +#pragma omp requires unified_shared_memory + +int A[3] = {-3,-4,-5}; +static int q = -401; +#pragma omp declare target link(A, q) + +#pragma omp begin declare target +void +f (uintptr_t *pA, uintptr_t *pq) +{ + if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42) +__builtin_abort (); + A[0] = 13; + A[1] = 14; + A[2] = 15; + q = 23; + *pA = (uintptr_t) &A[0]; + *pq = (uintptr_t) &q; +} +#pragma omp end declare target + +int +main () +{ + uintptr_t hpA = (uintptr_t) &A[0]; + uintptr_t hpq = (uintptr_t) &q; + uintptr_t dpA, dpq; + + A[0] = 1; + A[1] = 2; + A[2] = 3; + q = 42; + + for (int i = 0; i <= omp_get_num_devices (); ++i) +{ + #pragma omp target device(device_num: i) map(dpA, dpq) + f (&dpA, &dpq); + if (hpA != dpA || hpq != dpq) + __builtin_abort (); + if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23) + __builtin_abort (); + A[0] = 1; + A[1] = 2; + A[2] = 3; + q = 42; +} +}
Re: [Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file
On July 19, 2024 Tobias Burnus wrote: Updated patch attached. As #embed is now supported by GCC (thanks!), I could commit this patch :-) Committed as r15-3629-g508ef585243d46 → https://gcc.gnu.org/r15-3629-g508ef585243d46 Unless I missed something, we need to wait for a few pending patches before there is a real speed up. However, first, that will come then automatically to GCN compilations and, secondly, the generated code is already much nicer thanks to #embed + seems to be a tiny tiny bit faster already. Tobias commit 508ef585243d4674d06b0737bfe8769fc18f824f Author: Tobias Burnus Date: Fri Sep 13 16:18:46 2024 +0200 gcn/mkoffload.cc: Use #embed for including the generated ELF file gcc/ChangeLog: * config/gcn/mkoffload.cc (read_file): Remove. (process_asm): Do not add '#include' to generated C file. (process_obj): Generate C file that uses #embed and use __SIZE_TYPE__ and __UINTPTR_TYPE__ instead the #include-defined size_t and uintptr. (main): Update call to it; remove no longer needed file I/O. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 345bbf7709c..1f6337719e9 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -182,44 +182,6 @@ xputenv (const char *string) putenv (CONST_CAST (char *, string)); } -/* Read the whole input file. It will be NUL terminated (but - remember, there could be a NUL in the file itself. */ - -static const char * -read_file (FILE *stream, size_t *plen) -{ - size_t alloc = 16384; - size_t base = 0; - char *buffer; - - if (!fseek (stream, 0, SEEK_END)) -{ - /* Get the file size. */ - long s = ftell (stream); - if (s >= 0) - alloc = s + 100; - fseek (stream, 0, SEEK_SET); -} - buffer = XNEWVEC (char, alloc); - - for (;;) -{ - size_t n = fread (buffer + base, 1, alloc - base - 1, stream); - - if (!n) - break; - base += n; - if (base + 1 == alloc) - { - alloc *= 2; - buffer = XRESIZEVEC (char, buffer, alloc); - } -} - buffer[base] = 0; - *plen = base; - return buffer; -} - /* Parse STR, saving found tokens into PVALUES and return their number. Tokens are assumed to be delimited by ':'. */ @@ -651,10 +613,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *); struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *); - fprintf (cfile, "#include \n"); - fprintf (cfile, "#include \n"); - fprintf (cfile, "#include \n\n"); - fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count); fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count); @@ -719,35 +677,28 @@ process_asm (FILE *in, FILE *out, FILE *cfile) /* Embed an object file into a C source file. */ static void -process_obj (FILE *in, FILE *cfile, uint32_t omp_requires) +process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires) { - size_t len = 0; - const char *input = read_file (in, &len); - /* Dump out an array containing the binary. - FIXME: do this with objcopy. */ - fprintf (cfile, "static unsigned char gcn_code[] = {"); - for (size_t i = 0; i < len; i += 17) -{ - fprintf (cfile, "\n\t"); - for (size_t j = i; j < i + 17 && j < len; j++) - fprintf (cfile, "%3u,", (unsigned char) input[j]); -} - fprintf (cfile, "\n};\n\n"); + If the file is empty, a parse error is shown as the argument to is_empty + is an undeclared identifier. */ + fprintf (cfile, + "static unsigned char gcn_code[] = {\n" + "#embed \"%s\" if_empty (error_file_is_empty)\n" + "};\n\n", fname_in); fprintf (cfile, "static const struct gcn_image {\n" - " size_t size;\n" + " __SIZE_TYPE__ size;\n" " void *image;\n" "} gcn_image = {\n" - " %zu,\n" + " sizeof(gcn_code),\n" " gcn_code\n" - "};\n\n", - len); + "};\n\n"); fprintf (cfile, "static const struct gcn_data {\n" - " uintptr_t omp_requires_mask;\n" + " __UINTPTR_TYPE__ omp_requires_mask;\n" " const struct gcn_image *gcn_image;\n" " unsigned kernel_count;\n" " const struct hsa_kernel_description *kernel_infos;\n" @@ -1305,13 +1256,7 @@ main (int argc, char **argv) fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args"); obstack_free (&ld_argv_obstack, NULL); - in = fopen (gcn_o_name, "r"); - if (!in) - fatal_error (input_location, "cannot open intermediate gcn obj file"); - - process_obj (in, cfile, omp_requires); - - fclose (in); + process_obj (gcn_o_name, cfile, omp_requires); xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL)); xputenv (concat ("COMPILER_PATH=", cpath, NULL));
[Patch] Fortran: Fixes to OpenMP 'interop' directive parsing support
This patch fixes a couple of issues, like a missing white-space gobbling after matching an expression. It also reorganizes some code to handle 'identifier_"string"' vs. 'identifier' better as there were some diagnostic issues. (OpenMP requires for 'fr' that the argument is either an identifier (that is a scalar integer parameter) or a string; while for the older syntax, it can be any constant integer expression.) However, the two main changes are: * 'fr' and 'attr' actually support a list of arguments. While I believe 'attr("x", "y") and "attr("x"),attr("y")' are semantically identically, supporting more than one (or zero) values for 'fr' required a different encoding. * Jakub additionally suggested that for 'fr', which supports constant integers and string literals, we could pass on integer values – and do some checking. That's what this patch does: Known string values are converted to their associated integer values, others to 0. And if the integer/string value is unknown, a warning is printed [-Wopenmp]. Known values are those in the "OpenMP API Additional Definitions" document, https://www.openmp.org/specifications/ – with the addition of hsa / 7, which has been voted at spec level (no idea about ARB level) but not yet published. Note that that's the warning is based on what is defined there, i.e. 'level_zero' there is no warning, even though GCC does not support it. Obviously, if will add another value next year, GCC 15 will not support it and warn, even if the code is perfectly valid. — But I guess we can live with a warning in that case. Comments, remarks, suggestions? — Especially regarding the internal representation? Tobias PS: Next step will be to get the C/C++ parsing working, which also implies encoding this representation into 'tree'. (Then doing the tree conversion for Fortran.) Once satisfied with that, the middle end + libgomp part that links those bits will come next. And the question whether there should be one call per 'interop' directive or might be multiple (e.g. one per interop object in 'init'/'use'/'destroy'). Fortran: Fixes to OpenMP 'interop' directive parsing support Handle lists as argument to 'fr' and 'attr'; fix parsing corner cases. Additionally, 'fr' values are now internally stored as integer, permitting the diagnoses (warning) for values not defined in the OpenMP additional definitions document. PR fortran/116661 gcc/fortran/ChangeLog: * gfortran.h (gfc_omp_namelist): Rename 'init' members for clarity. * match.cc (gfc_free_omp_namelist): Handle renaming. * dump-parse-tree.cc (show_omp_namelist): Update for new format and features. * openmp.cc (gfc_match_omp_prefer_type): Parse list to 'fr' and 'attr'; store 'fr' values as integer. (gfc_match_omp_init): Rename variable names. gcc/ChangeLog: * omp-api.h (omp_get_fr_id_from_name, omp_get_name_from_fr_id): New prototypes. * omp-general.cc (omp_get_fr_id_from_name, omp_get_name_from_fr_id): New. include/ChangeLog: * gomp-constants.h (GOMP_INTEROP_IFR_LAST, GOMP_INTEROP_IFR_SEPARATOR, GOMP_INTEROP_IFR_NONE): New. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/interop-1.f90: Extend, update dg-*. * gfortran.dg/gomp/interop-2.f90: Update dg-error. * gfortran.dg/gomp/interop-3.f90: Add dg-warning. gcc/fortran/dump-parse-tree.cc | 84 +--- gcc/fortran/gfortran.h | 4 +- gcc/fortran/match.cc | 10 +- gcc/fortran/openmp.cc| 305 --- gcc/omp-api.h| 3 + gcc/omp-general.cc | 29 +++ gcc/testsuite/gfortran.dg/gomp/interop-1.f90 | 32 ++- gcc/testsuite/gfortran.dg/gomp/interop-2.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/interop-3.f90 | 2 +- include/gomp-constants.h | 5 + 10 files changed, 314 insertions(+), 162 deletions(-) diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 8fc6141611c..3547d7f8aca 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -37,6 +37,8 @@ along with GCC; see the file COPYING3. If not see #include "constructor.h" #include "version.h" #include "parse.h" /* For gfc_ascii_statement. */ +#include "omp-api.h" /* For omp_get_name_from_fr_id. */ +#include "gomp-constants.h" /* For GOMP_INTEROP_IFR_SEPARATOR. */ /* Keep track of indentation for symbol tree dumps. */ static int show_level = 0; @@ -1537,35 +1539,69 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) } else if (list_type == OMP_LIST_INIT) { - int i = 0; if (n->u.init.target) fputs ("target,", dumpfile); if (n->u.init.targetsync) fputs ("targetsync,", dumpfile); - char *prefer_type = n->u.init.str; - if (n->u.init.len) - fputs ("prefer_type(", dumpfile); - if (n->u.init.len) - while (*prefer_type) - { - fputc ('{', dumpfile); - if (n->u2.interop_int
[committed] fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661]
This patch fixes an issue with unintialized variables causing random ICE. Committed as r15-3581-g4e9265a474def9 * * * However, follow-up work is needed as there are multiple issues: * The check whether something is a identifier (integer parameter) and not just a constant expression did fail in some corner cases. → causes now reliably a testsuite FAIL. * Some checks are also not quite right * After gfc_match_expr, a gobble whitespace is missing * I missed that 'fr(…)' and 'attr(…)' accept a list of values* * The latter requires a different internal representation. I have a partial fix for this, but the last two items remove some more work, hence, I defer this to the next patch. Tobias (*) It looks also as if there will be post-TR13 spec changes, but it is not clear whether those just change the wording or more. commit 4e9265a474def98cb6cdb59c15fbcb7630ba330e Author: Tobias Burnus Date: Wed Sep 11 09:25:47 2024 +0200 fortran/openmp.cc: Fix var init and locus use to avoid uninit values [PR fortran/116661] gcc/fortran/ChangeLog: PR fortran/116661 * openmp.cc (gfc_match_omp_prefer_type): NULL init a gfc_expr variable and use right locus in gfc_error. diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index c04d8b0f528..1145e2ff890 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -1860,6 +1860,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar } fr_found = true; gfc_symbol *sym = NULL; + e = NULL; locus loc = gfc_current_locus; if (gfc_match_symbol (&sym, 0) != MATCH_YES || gfc_match (" _") == MATCH_YES) @@ -1881,7 +1882,7 @@ gfc_match_omp_prefer_type (char **pref_str, int *pref_str_len, int **pref_int_ar { gfc_error ("Expected constant integer identifier or " "non-empty default-kind character literal at %L", - &e->where); + &loc); gfc_free_expr (e); return MATCH_ERROR; }
Re: [committed] OpenMP: Add interop routines to omp_runtime_api_procname
Now with attached patch … Tobias Burnus wrote: I realized that the attached change (committed asr15-3582-g6291f25631500c) was missing from what I committed in r15-3249-g0beac1db38855e libgomp: Add interop types and routines to OpenMP's headers and module I also checked the last 5 or so commits to omp.h.in, but for those routines, we seemed to have remembered to update the API routine check for those. Tobias commit 6291f25631500c2d1c2328f919aa4405c3837f02 Author: Tobias Burnus Date: Wed Sep 11 12:02:24 2024 +0200 OpenMP: Add interop routines to omp_runtime_api_procname gcc/ * omp-general.cc (omp_runtime_api_procname): Add omp_get_interop_{int,name,ptr,rc_desc,str,type_desc} and omp_get_num_interop_properties. diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index 0b61335dba4..aaa179afe13 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -3260,7 +3260,10 @@ omp_runtime_api_procname (const char *name) "alloc", "calloc", "free", + "get_interop_int", + "get_interop_ptr", "get_mapped_ptr", + "get_num_interop_properties", "realloc", "target_alloc", "target_associate_ptr", @@ -3289,6 +3292,10 @@ omp_runtime_api_procname (const char *name) "get_device_num", "get_dynamic", "get_initial_device", + "get_interop_name", + "get_interop_rc_desc", + "get_interop_str", + "get_interop_type_desc", "get_level", "get_max_active_levels", "get_max_task_priority",
[committed] OpenMP: Add interop routines to omp_runtime_api_procname
I realized that the attached change (committed asr15-3582-g6291f25631500c) was missing from what I committed in r15-3249-g0beac1db38855e libgomp: Add interop types and routines to OpenMP's headers and module I also checked the last 5 or so commits to omp.h.in, but for those routines, we seemed to have remembered to update the API routine check for those. Tobias
Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components
Hi Jakub, Jakub Jelinek wrote: On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote: Background: OpenMP states that for 'map(var)', all allocatable components of 'var' will automatically also be mapped ('deep mapping'). Not a review, just a comment. This kind of recursive mapping is also what needs to happen for declare mapper, so wonder if that shouldn't be solved together; and some way to merge mappings of one field after another with the same way if consecutive fields (with possibly some padding bits in between) are mapped the same way. In case mapping Fortran allocatable components, I do not see the padding part. For 'map(var)' all of var is mapped, including all array descriptors. We then need to map the allocated memory (fully, if an array: all array elements) + do a pointer attach. And we need to handle unallocated components. That's different to 'mapper', which is more flexible on one hand - but also really explicit. There is no hidden 'only if allocated do', possibly except for zero-sized array sections or iterator steps. The Fortran part also handles polymorphic variables, where it is only known at runtime which components exist – which means that the whole tree of mappings to do is unknown at compile time. For 'mapper' that part is known. [Granted, TR13 now explicitly does not permit mapping of polymorphic variables as there are too many corner cases. But for 6.x it is planned to re-add it.] In any case, the Fortran allocatable-component mapping also needs to be applied to the mapper (+ iterator) generated code — and it needs to come last after all implicit mappings and remove-mapping optimizations. It could be also be done as part of the mapper expansion. * * * Having said this, there might be well a useful common approach that covers Fortran deep mapping, 'mapper' and 'iterator'. But the current approaches don't use them. Namely, we have: * The current Fortran deep mapper (as just posted) was ready in March 2022, https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591075.html * The mapper patch (latest version) is at https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629363.html – albeit first bits date back to https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591983.html * There is also an 'iterator' patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html – albeit it lacks the 'mapper' part, which is WIP and needs for main the patch 'mapper' of the previous bullet. * * * If we have a clear plan to to implement things, I am somewhat willing to revise patches, if it makes sense. But for that, a clear design is needed. And, in any case, it would be good, if we could get all of the features above into GCC 15: Fortran deep mapping, 'mapper' (+ target_update with strides), 'iterator' [and some other backlog]. Tobias
[Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components
Background: OpenMP states that for 'map(var)', all allocatable components of 'var' will automatically also be mapped ('deep mapping'). Thus, for type(t), allocatable :: var(:) this leads to a pseudo code like: map(var, storage_size(var)) do i = lbound(var), ubound(var) if (allocated(var(i)%comp1) & map(var(i)%comp1, storage_size(var(i)%comp1)) end do and more complicated, e.g. var(1204)%comp1(395)%str might be an allocatable scalar. Or var is an recursive type, e.g. it has 'type(t), allocatable :: self' as component such that var%self%self%self%self ... might exist (and 'self' could also be an array …). * * * Approach: The idea is to handle it inlower_omp_target as follows (semi-pseudocode): /* Obtain number of additional mappings, in the example above, it would be size(var) * 2 for map + attach of 'comp1', assuming all 'var(:)%comp1' are allocated and no other alloc comp. exist. */ tree cnt = lang_hooks.decls.omp_deep_mapping_cnt (...) if (cnt) deep_map_cnt *= cnt; if (cnt) → switch to pointer type + dynamically allocate addrs, kinds, sizes → add 'uintptr_t s[]' as tailing member to addr struct. (Thus, all automatically mapped items are added to the end.) In the big map loop, call additionally: lang_hooks.decls.omp_deep_mapping Additionally, in some cases, the only question that needs to be solved is: Does the decl have an allocatable component or not. In that case, lang_hooks.decls.omp_deep_mapping_p is sufficient. * * * RFC: Does this approach sound sensible? Does the attached patch (middle-end part) look reasonable? One downside of the current approach is that for map(var) when 'var' is present we still attempt to map all allocatable components instead of stopping directly after finding 'var' in the splay table. this can be fixed by passing more attributes to libgomp, but as the items come last in the list, it might be not straight forward. (maybe a starts-here + ends-here flags, where the attach next to starts-here flag could be used to do the lookup?). This might also lead to cases where an allocatable variable is mapped that otherwise would not be mapped. Albeit as 'map(var%comp)' of a later allocated 'comp' is only guaranteed to work with the 'always' modifier, having it automapped for 'map(var)' should at least not affect the values that were mapped. * * * The full patch has been applied to OG14 (= devel/omp/gcc-14) branch. The interesting bit are the hook entry points gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, and gfc_omp_deep_mapping → https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209 * * * I have attached the middle-end patch, only, of the patch: https://gcc.gnu.org/g:92c3af3d4f8 Fortran/OpenMP: Support mapping of DT with allocatable components to focus on that part. Tobias PS: In TR13 and also after TR13, a couple of mapping features were added that permit shallow mapping, unmapping of allocatable components etc. I have not tried to analyze whether this affects this patch, but I think it remains largely as is. Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components gcc/ChangeLog: * langhooks-def.h (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New. (LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT, LANG_HOOKS_OMP_DEEP_MAPPING): Define. (LANG_HOOKS_DECLS): Use it. * langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New stubs. * langhooks.h (struct lang_hooks_for_decls): Add new hooks * omp-expand.cc (expand_omp_target): Handle dynamic-size addr/sizes/kinds arrays. * omp-low.cc (build_sender_ref, fixup_child_record_type, scan_sharing_clauses, lower_omp_target): Update to handle new hooks and dynamic-size addr/sizes/kinds arrays. --- gcc/langhooks-def.h | 10 +++ gcc/langhooks.cc| 24 ++ gcc/langhooks.h | 15 gcc/omp-expand.cc | 18 - gcc/omp-low.cc | 224 ++-- 5 files changed, 265 insertions(+), 26 deletions(-) diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h index f5c67b6823c..756714558e5 100644 --- a/gcc/langhooks-def.h +++ b/gcc/langhooks-def.h @@ -86,6 +86,10 @@ extern enum omp_clause_defaultmap_kind lhd_omp_predetermined_mapping (tree); extern tree lhd_omp_assignment (tree, tree, tree); extern void lhd_omp_finish_clause (tree, gimple_seq *, bool); extern tree lhd_omp_array_size (tree, gimple_seq *); +extern bool lhd_omp_deep_mapping_p (const gimple *, tree); +extern tree lhd_omp_deep_mapping_cnt (const gimple *, tree, gimple_seq *); +extern void lhd_omp_deep_mapping (const gimple *, tree, unsigned HOST_WIDE_INT, + tree, tree, tree, tree, tree, gimple_seq *); struct gimplify_omp_ctx; extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *, tree); @@ -272,6 +276,9 @@ extern tree lhd_unit_size_without_reusable_padding (tree)
Re: [PATCH v3 03/12] libgomp: runtime support for target_device selector
Hi all, Jakub Jelinek wrote: On Sat, Jul 20, 2024 at 02:42:22PM -0600, Sandra Loosemore wrote: This patch implements the libgomp runtime support for the dynamic target_device selector via the GOMP_evaluate_target_device function. […] Now for kind, isa and arch traits in the target_device set this patch decides based on compiler flags used to compile some routine in libgomp.so or libgomp.a. While this can work in the (very unfortunate) GCN state of things where only exact isa match is possible (I really hope we can one day generalize it by being able to compile for a set of isas by supporting lowest denominator and patching the EM_* in the ELF header or something similar, perhaps with runtime decisions on what to do for different CPUs), I think that can only work to some extend. LLVM has "gfx11-generic" which is compatible with gfx110{0,1,2,3,} and gfx115{0,1,2}, which at least helps a bit. For gfx10, it has gfx10-1-generic for gfx101{0,1,2,3} and gfx10-3-generic for gfx103[0-6] and gfx9-generic for gfx90{0,2,4,6,9,c}. Thus, we could have versions which support a common subset, but we still need multiple libraries. And it needs to be implemented … This sounds like a task for the GCN maintainer … * * * deciding what to do based on how libgomp.a or libgomp.so.1 has been compiled for the rest is IMHO wrong. I wonder whether we should do something like the following. [The following is a mix between compile code and generated code, for illustrative purpose.] Inside the compiler do: #ifndef ACCEL_COMPILER intr = 0; if (targetm.omp.device_kind_arch_isa != NULL) r = targetm.omp.device_kind_arch_isa (omp_device_{kind,arch,isa}, val); if (dev_num && TREE_CODE (dev_num) == INTEGER_CST) { if (dev_num < -1 /* INVALID_DEVICE or nonconforming */) → 0 if (dev_num == initial_device) → r } /* The '? :' condition is a compile time condition. */ d = ? : omp_get_default_device (); if (d < -1) → 0 else if (d == -1 || d == omp_get_initial_device ()) → r else → GOMP_get_device_kind_arch_isa (d, kind, arch, isa) #else /* VARIANT 1: Assume that neither reverse offload nor nested target occurs. */ →targetm.omp.device_kind_arch_isa (kind, arch, isa) /* VARIANT 2 - d = ? : omp_get_default_device (); if (d == omp_get_device_num ()) →targetm.omp.device_kind_arch_isa (kind, arch, isa) else /* Cannot really do anything here - and as no nested target is permitted, use 'false'. */ → 0 #endif * * * And on the libgomp side GOMP_get_device_kind_arch_isa → plugin code. And there: (A) GCN: kind and arch are clear. For ISA: agent->device_isa + use existing isa_hsa_name() function (or likewise). (B) Nvptx: cuDeviceGetAttribute + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75 and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76. Example: sm_89 = (major) 8 and (minor) 9. * * * Does this sound sensible? Tobias PS: For the current host-offload GSoC task, we might eventually think of using cpuid on x86-64, i.e. gcc/config/i386/cpuid.h. PS: RFC remains: Should 'sm_80' be true if the hardware/compilation is 'sm_89' or not? Namely: Does 'sm_80' denote the capability or the specific hardware? Regarding this topic, see also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662059.html
[patch][v2] Fortran: Add OpenMP 'interop' directive parsing support
Now also supports the following (note the variable name): 'init(targetsync, target)' – and I fixed an ICE when the variable parsing failed. Comments before I commit it? Tobias Tobias Burnus wrote: This patch adds Fortran parsing support for OpenMP's 'interop' directive (which stops with a 'sorry' in trans-openmp.cc as the middle end support is still missing). Tested on x86-64-gnu-linux. Comments, suggestions, remarks? * * * Background: 'interop' makes it easier to call, e.g., a CUDA-BLAS function directly as it permits to map an OpenMP device number (→ "target" modifier required) to the "foreign runtime" device number or to get directly a stream object (→ if "targetsync" modifier specified) with dependency tracking. Just calling '!$omp interop init(obj)' works but that leaves the decision which type of object should be returned to the run time. Using 'prefer_type', the user can ask for a specific type. Permits is a string such as "hip" or an integer constant such as omp_ifr_cuda_driver – and the old-style syntax is 'prefer_type(integer expr|literal string> [ , ...])'. [Note thatn a constant integer expression is permitted.] The new syntax permits additional attributes like for 'sycl' requesting an 'in-order' queue (instead of the default 'out-of-order' queue when obtaining a stream. The new syntax is 'prefer_type( {...} [, {...} ... } ) where '{ ... }' is a list of either 'attr("ompx_...")' (i.e. 'attr(...)' with literal string arg that starts with ompx_ and does not contain a ',') or 'fr()' where the identifier is an integer constant. 'fr' can be present or not, but only once per {...} while multiple 'attr' may be used. [Note that as non-string only an identifier is permitted (i.e. a integer parameter).] I decided for the used way to encode the string – but I am open to other representations as well. In my WIP/RFC patch is is used as shown in plugin-*.c in the patch https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html The available foreign runtimes and values that can be returned values are hidden in that patch and more readable in the documentation patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661365.html If someone wants to delve into the details of the 'interop' feature: Have a look at OpenMP 5.1 (5.2) *and* TR13 and the additional definition document at https://www.openmp.org/specifications/ ('hsa': publishing pending). * * * Tobias PS: In the dump, I am a bit lazy and add spurious tailing ','. As it is only a dump, I decided adding a bunch of checks to ensure that a ',' only gets printed if needed is not really required. If you think otherwise, I can surely add a bunch of 'if' an only print it conditionally. PPS: In order to to use 'interop', mainly the part in middle is missing, i.e. some middle-end gimplification with a call into libgomp – and the libgomp function. A stub version of the latter and some (loosely) tested plugin handling does exist as WIP/RFC patch, see patch link above. - Besides gimplify and the libgomp function, a bunch of tests and, obviously, the C and C++ FE counterpart to this patch have to be implemented.Fortran: Add OpenMP 'interop' directive parsing support Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented' after resolving. Additionally, it moves some clause dumping away from the end directive as that lead to 'nowait' not being printed when it should as some cases were missed. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_INIT. (show_omp_clauses): Handle OMP_LIST_{INIT,USE,DESTORY}; move 'nowait' from end-directive to the directive dump. (show_omp_node, show_code_node): Handle EXEC_OMP_INTEROP. * gfortran.h (enum gfc_statement): Add ST_OMP_INTEROP. (OMP_LIST_INIT, OMP_LIST_USE, OMP_LIST_DESTROY): Add. (enum gfc_exec_op): Add EXEC_OMP_INTEROP. (struct gfc_omp_namelist): Add interop items to union. (gfc_free_omp_namelist): Add boolean arg. * match.cc (gfc_free_omp_namelist): Update to free interop union members. * match.h (gfc_match_omp_interop): New. * openmp.cc (gfc_omp_directives): Uncomment 'interop' entry. (gfc_free_omp_clauses, gfc_match_omp_allocate, gfc_match_omp_flush, gfc_match_omp_clause_reduction): Update call. (enum omp_mask2): Add OMP_CLAUSE_{INIT,USE,DESTROY}. (OMP_INTEROP_CLAUSES): Use it. (gfc_match_omp_clauses): Match those clauses. (gfc_match_omp_prefer_type, gfc_match_omp_init, gfc_match_omp_interop): New. (resolve_omp_clauses): Handle interop clauses. (omp_code_to_statement): Add ST_OMP_INTEROP. (gfc_resolve_omp_d
[patch] config/nvptx: Handle downward compat for OpenMP context selector
For x86-64, the context selector matching is are currently based on features. That's obvious for 'SSE2' where any system offering SSE2 matches, but that also the case for, e.g. a selector asking for 'i486' – which matches when compiling for 'i486', 'i586' and 'i686'. That has pro and cons. Assume compiling for 'i686': If there is a context selector asking for ISA 'i486' we want to use it as i686 supports it – and not, e.g., the generic fallback. — On the other hand, if there are two variants, one for 'i686' and one for 'i486', we want to use the 'i686' variant if the hardware supports it. [I am not sure how to handle this best.] * * * The attached patch does now likewise for nvptx, where the compute capabilities are downward compatible with one exception → https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-module-directives-target "In general, generations of SM architectures follow an onion layer model, where each generation adds new features and retains all features of previous generations. The onion layer model allows the PTX code generated for a given target to be run on later generation devices. Target architectures with suffix “a”, such as sm_90a, include architecture-accelerated features that are supported on the specified architecture only, hence such targets do not follow the onion layer model. Therefore, PTX code generated for such targets cannot be run on later generation devices. Architecture-accelerated features can only be used with targets that support these features." * * * The patch additionally updates the documentation. Comments, suggestions, approval, disapproval? Tobias PS: I wonder whether it wouldn't make sense to permit all sm_ values with -march=, even if some produce the same binaries (at least for now) vs. supporting only some with -march= and using -march-map= to handle all values. But that's independent of this RFC patch. config/nvptx: Handle downward compat for OpenMP context selector Nvptx's compute capabilities (SM_XX) are downward compatible, i.e. SM_80 supports all features of SM_30, SM_70 etc. Additionally, GCC's -march= currently only supports those values that actually change the generated code - and offers -march=... to map higher values to the next lower supported version. Update libgomp.texi to document the downward compatibility and case sensitivity of the context selectors. gcc/ChangeLog: * config/nvptx/nvptx-sm.def (NVPTX_SM_COMPAT): Add compute capabilities supported by -march-map= lower than sm_80 (= highest supported -march=). * config/nvptx/gen-omp-device-properties.sh: Hande it. * config/nvptx/gen-h.sh: Ignore it. * config/nvptx/gen-multilib-matches.sh: Likewise. * config/nvptx/gen-opt.sh: Likewise. * config/nvptx/nvptx.cc (sm_version_to_number): New. (nvptx_omp_device_kind_arch_isa): Match when requested ISA (sm_XX) version is lower than actual ISA version. libgomp/ChangeLog: * libgomp.texi (OpenMP Context Selectors): Add note about case sensitivity and downward compatibility. * testsuite/libgomp.c/declare-variant-3.h: Extend to check for downward compatibility. * testsuite/libgomp.c/declare-variant-3-sm30.c: Update. * testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise. * testsuite/libgomp.c/declare-variant-3.c: Likewise. gcc/config/nvptx/gen-h.sh | 2 +- gcc/config/nvptx/gen-multilib-matches.sh | 2 +- gcc/config/nvptx/gen-omp-device-properties.sh | 2 +- gcc/config/nvptx/gen-opt.sh| 2 +- gcc/config/nvptx/nvptx-sm.def | 22 +++ gcc/config/nvptx/nvptx.cc | 33 -- .../testsuite/libgomp.c/declare-variant-3-sm30.c | 3 +- .../testsuite/libgomp.c/declare-variant-3-sm35.c | 3 +- .../testsuite/libgomp.c/declare-variant-3-sm53.c | 3 +- .../testsuite/libgomp.c/declare-variant-3-sm70.c | 3 +- .../testsuite/libgomp.c/declare-variant-3-sm75.c | 3 +- .../testsuite/libgomp.c/declare-variant-3-sm80.c | 1 + libgomp/testsuite/libgomp.c/declare-variant-3.c| 8 ++- libgomp/testsuite/libgomp.c/declare-variant-3.h| 75 -- 14 files changed, 140 insertions(+), 22 deletions(-) diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh index ea75e127cde..592dd8bebc8 100644 --- a/gcc/config/nvptx/gen-h.sh +++ b/gcc/config/nvptx/gen-h.sh @@ -21,7 +21,7 @@ nvptx_sm_def="$1/nvptx-sm.def" gen_copyright_sh="$1/gen-copyright.sh" -sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//') +sms=$(grep '^NVPTX_SM[^_]' $nvptx_sm_def | sed 's/.*(//;s/,.*//') cat <= v) +__builtin_abort (); + __built
[patch][v2] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]
Hi Richard, Am 02.09.24 um 13:58 schrieb Richard Biener: Hmm, I can't really follow how and where it's currently decided whether to output offload tables for the LTRANS units Before the patch, output_offload_tables is called unconditionally, but guarded by the check whether there is anything to output at all. Call trees: When outputting the .o files, the call is done via ipa_passes → ipa_write_summaries → ipa_write_summaries_1. This calls ipa_write_summaries twice: once for the offload/for-device LTO section and once for the host LTO section – and both calls are needed. For the LTO (lto1, ltrans) step, the call tree starts with: do_whole_program_analysis → lto_wpa_write_files → stream_out_partitions → stream_out_partitions_1 → stream_out → ipa_write_optimization_summaries. Here, stream_out_partitions potentially forks the 'stream_out_partitions_1' calls. And each stream_out_partitions_1 calls for each (of its share) of the partitions 'stream_out' in a loop. With either code path, the ipa_write... function then calls: write_lto → lto_output → output_offload_tables. but instead of an odd global variable would it be possible to pass that down as a flag or, alternatively encode that flag in the representation for the LTRANS partition? I suppose that's the out_decl_state? Actually, I tried follow your initial suggestion of the PR, but now moved to the somewhat clearer out_decl_state. Tobias LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables): Remove offload_ frees. * lto-streamer-out.cc (lto_output): Make call to it depend on lto_get_out_decl_state ()->output_offload_tables_p. * lto-streamer.h (struct lto_out_decl_state): Add output_offload_tables_p field. * tree-pass.h (ipa_write_optimization_summaries): Add bool argument. * passes.cc (ipa_write_summaries_1): Add bool output_offload_tables_p arg. (ipa_write_summaries): Update call. (ipa_write_optimization_summaries): Accept output_offload_tables_p. gcc/lto/ChangeLog: * lto.cc (stream_out): Update call to ipa_write_optimization_summaries to pass true for first partition. gcc/lto-cgraph.cc | 10 -- gcc/lto-streamer-out.cc | 3 ++- gcc/lto-streamer.h | 3 +++ gcc/lto/lto.cc | 2 +- gcc/passes.cc | 11 --- gcc/tree-pass.h | 3 ++- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6395033ab9d..1492409427c 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -1139,16 +1139,6 @@ output_offload_tables (void) streamer_write_uhwi_stream (ob->main_stream, 0); lto_destroy_simple_output_block (ob); - - /* In WHOPR mode during the WPA stage the joint offload tables need to be - streamed to one partition only. That's why we free offload_funcs and - offload_vars after the first call of output_offload_tables. */ - if (flag_wpa) -{ - vec_free (offload_funcs); - vec_free (offload_vars); - vec_free (offload_ind_funcs); -} } /* Verify the partitioning of NODE. */ diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc index 523d6dad221..a4b171358d4 100644 --- a/gcc/lto-streamer-out.cc +++ b/gcc/lto-streamer-out.cc @@ -2829,7 +2829,8 @@ lto_output (void) statements using the statement UIDs. */ output_symtab (); - output_offload_tables (); + if (lto_get_out_decl_state ()->output_offload_tables_p) +output_offload_tables (); if (flag_checking) { diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 79c44d2cae7..4da1a3efe03 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -531,6 +531,9 @@ struct lto_out_decl_state /* True if decl state is compressed. */ bool compressed; + + /* True if offload tables should be output. */ + bool output_offload_tables_p; }; typedef struct lto_out_decl_state *lto_out_decl_state_ptr; diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc index 52dd436fd9a..1ee215d8f1d 100644 --- a/gcc/lto/lto.cc +++ b/gcc/lto/lto.cc @@ -178,7 +178,7 @@ stream_out (char *temp_filename, lto_symtab_encoder_t encoder, int part) gcc_assert (!dump_file); streamer_dump_file = dump_begin (TDI_lto_stream_out, NULL, part); - ipa_write_optimization_summaries (encoder); + ipa_write_optimization_summaries (encoder, part == 0); free (CONST_CAST (char *, file->filename)); diff --git a/gcc/passes.cc b/gcc/passes.cc index d73f8ba97b6..057850f4dec 100644 --- a/gcc/passes.cc +++ b/gcc/passes.cc @@ -2829,11 +2829,13 @@ ipa_write_summaries_2 (opt_pass *pass, struct lto_out_decl_state *state) summaries. SET is the set of nodes to be written. */ static void -ipa_write_summaries_1 (lto_symtab_encoder_t encoder) +ipa_writ
[patch] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]
The attached patch tries to fix the issue exposed by the PR: The main ingredient is partitioning of the LTO work, e.g. by using -flto-partition=max. With -flto=2 (or higher or when a jobserver has been detected), not only the LTO part is run in parallel but also the creation of the ltrans files itself, i.e. gcc/lto/lto.cc's stream_out_partitions forks multiple processes to write those files concurrently (here: -flto=2 means two processes, each writing about half of the partitions). For each partition, output_offload_tables is called – which in principle would add the offload tables to each file. To prevent this, in flag_wpa mode, the tables were freed. That solves the WPA problem, but only if all partitions are written by a single process (e.g. -flto=1). If not, the data is duplicated and only the data belonging to the fork is modified. This patch moves the logic to gcc/lto/lto.cc and sets a global variable to ensure that it is only output for the first partition, independently whether there is only one or several processes writing the ltrans file, trying to follow what Richard proposed in the PR? The patch has been tested on x86-64-gnu-linux with nvptx offloading, but I should do a full bootstrap+regtest next. Comments, suggestions, remarks, approval? Tobias LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * omp-offload.h (offload_output_tables_p): New extern bool var. * omp-offload.cc (offload_output_tables_p): Define it with value true. * lto-cgraph.cc (output_offload_tables): Only output tables when offload_output_tables_p is true. gcc/lto/ChangeLog: * lto.cc (stream_out_partitions_1): Set offload_output_tables_p to false except for the first partition. gcc/lto-cgraph.cc | 16 gcc/lto/lto.cc | 3 +++ gcc/omp-offload.cc | 2 ++ gcc/omp-offload.h | 1 + 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6395033ab9d..19ac252e1b4 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -1081,8 +1081,10 @@ output_offload_tables (void) { bool output_requires = (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0); - if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars) - && !output_requires) + if (!offload_output_tables_p + || (vec_safe_is_empty (offload_funcs) + && vec_safe_is_empty (offload_vars) + && !output_requires)) return; struct lto_simple_output_block *ob @@ -1139,16 +1141,6 @@ output_offload_tables (void) streamer_write_uhwi_stream (ob->main_stream, 0); lto_destroy_simple_output_block (ob); - - /* In WHOPR mode during the WPA stage the joint offload tables need to be - streamed to one partition only. That's why we free offload_funcs and - offload_vars after the first call of output_offload_tables. */ - if (flag_wpa) -{ - vec_free (offload_funcs); - vec_free (offload_vars); - vec_free (offload_ind_funcs); -} } /* Verify the partitioning of NODE. */ diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc index 52dd436fd9a..69c7527d399 100644 --- a/gcc/lto/lto.cc +++ b/gcc/lto/lto.cc @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "lto-common.h" #include "opts-jobserver.h" +#include "omp-offload.h" /* Number of parallel tasks to run. */ static int lto_parallelism; @@ -226,12 +227,14 @@ wait_for_child () static void stream_out_partitions_1 (char *temp_filename, int blen, int min, int max) { + offload_output_tables_p = (min == 0); /* Write all the nodes in SET. */ for (int p = min; p < max; p ++) { sprintf (temp_filename + blen, "%u.o", p); stream_out (temp_filename, ltrans_partitions[p]->encoder, p); ltrans_partitions[p]->encoder = NULL; + offload_output_tables_p = false; } } diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index 934fbd80bdd..76bfda94217 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -88,6 +88,8 @@ struct oacc_loop /* Holds offload tables with decls. */ vec *offload_funcs, *offload_vars, *offload_ind_funcs; +bool offload_output_tables_p = true; + /* Return level at which oacc routine may spawn a partitioned loop, or -1 if it is not a routine (i.e. is an offload fn). */ diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index d972bb7eafd..2d1d173016c 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -29,6 +29,7 @@ extern int oacc_fn_attrib_level (tree attr); extern GTY(()) vec *offload_funcs; extern GTY(()) vec *offload_vars; extern GTY(()) vec *offload_ind_funcs; +extern bool offload_output_tables_p; extern void omp_finish_file (void); extern void omp_di
[patch] lto/lto.cc: Fix build with not HAVE_WORKING_FORK
With HAVE_WORKING_FORK unset, I get an unused by set compile error. That's fixed with the attached patch. Tobias PS: And if someone wonders what I am doing, see https://gcc.gnu.org/PR116535 lto/lto.cc: Fix build with not HAVE_WORKING_FORK gcc/lto/ChangeLog: * lto.cc: Add missing HAVE_WORKING_FORK. diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc index 58ff0c45f57..66d9f136ae1 100644 --- a/gcc/lto/lto.cc +++ b/gcc/lto/lto.cc @@ -62,8 +62,10 @@ along with GCC; see the file COPYING3. If not see /* Number of parallel tasks to run. */ static int lto_parallelism; +#ifdef HAVE_WORKING_FORK /* Number of active WPA streaming processes. */ static int nruns = 0; +#endif /* GNU make's jobserver info. */ static jobserver_info *jinfo = NULL;
[patch] lto-wrapper: Honor -save-temps for ltrans' makefile
Noticed that -save-tmp is ignored for parallel LTO. With this patch, the result is now: make -f ./a.ltrans.mk -j2 all [Leaving LTRANS ./a.ltrans.mk] instead of make -f /tmp/ccXgtcjJ.mk -j2 all [Leaving LTRANS /tmp/ccXgtcjJ.mk] OK for mainline? Tobias lto-wrapper: Honor -save-temps for ltrans' makefile gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Honor -save-temps for makefile name. diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc index 6bfc96590a5..c07765b37a2 100644 --- a/gcc/lto-wrapper.cc +++ b/gcc/lto-wrapper.cc @@ -1994,7 +1994,10 @@ cont: if (parallel) { - makefile = make_temp_file (".mk"); + if (save_temps) + makefile = concat (dumppfx, "ltrans.mk", NULL); + else + makefile = make_temp_file (".mk"); mstream = fopen (makefile, "w"); qsort (ltrans_priorities, nr, sizeof (int) * 2, cmp_priority); }
[patch] Fortran: Add OpenMP 'interop' directive parsing support
This patch adds Fortran parsing support for OpenMP's 'interop' directive (which stops with a 'sorry' in trans-openmp.cc as the middle end support is still missing). Tested on x86-64-gnu-linux. Comments, suggestions, remarks? * * * Background: 'interop' makes it easier to call, e.g., a CUDA-BLAS function directly as it permits to map an OpenMP device number (→ "target" modifier required) to the "foreign runtime" device number or to get directly a stream object (→ if "targetsync" modifier specified) with dependency tracking. Just calling '!$omp interop init(obj)' works but that leaves the decision which type of object should be returned to the run time. Using 'prefer_type', the user can ask for a specific type. Permits is a string such as "hip" or an integer constant such as omp_ifr_cuda_driver – and the old-style syntax is 'prefer_type(string> [ , ...])'. [Note thatn a constant integer expression is permitted.] The new syntax permits additional attributes like for 'sycl' requesting an 'in-order' queue (instead of the default 'out-of-order' queue when obtaining a stream. The new syntax is 'prefer_type( {...} [, {...} ... } ) where '{ ... }' is a list of either 'attr("ompx_...")' (i.e. 'attr(...)' with literal string arg that starts with ompx_ and does not contain a ',') or 'fr()' where the identifier is an integer constant. 'fr' can be present or not, but only once per {...} while multiple 'attr' may be used. [Note that as non-string only an identifier is permitted (i.e. a integer parameter).] I decided for the used way to encode the string – but I am open to other representations as well. In my WIP/RFC patch is is used as shown in plugin-*.c in the patch https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html The available foreign runtimes and values that can be returned values are hidden in that patch and more readable in the documentation patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661365.html If someone wants to delve into the details of the 'interop' feature: Have a look at OpenMP 5.1 (5.2) *and* TR13 and the additional definition document at https://www.openmp.org/specifications/ ('hsa': publishing pending). * * * Tobias PS: In the dump, I am a bit lazy and add spurious tailing ','. As it is only a dump, I decided adding a bunch of checks to ensure that a ',' only gets printed if needed is not really required. If you think otherwise, I can surely add a bunch of 'if' an only print it conditionally. PPS: In order to to use 'interop', mainly the part in middle is missing, i.e. some middle-end gimplification with a call into libgomp – and the libgomp function. A stub version of the latter and some (loosely) tested plugin handling does exist as WIP/RFC patch, see patch link above. - Besides gimplify and the libgomp function, a bunch of tests and, obviously, the C and C++ FE counterpart to this patch have to be implemented. Fortran: Add OpenMP 'interop' directive parsing support Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented' after resolving. Additionally, it moves some clause dumping away from the end directive as that lead to 'nowait' not being printed when it should as some cases were missed. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_INIT. (show_omp_clauses): Handle OMP_LIST_{INIT,USE,DESTORY}; move 'nowait' from end-directive to the directive dump. (show_omp_node, show_code_node): Handle EXEC_OMP_INTEROP. * gfortran.h (enum gfc_statement): Add ST_OMP_INTEROP. (OMP_LIST_INIT, OMP_LIST_USE, OMP_LIST_DESTROY): Add. (enum gfc_exec_op): Add EXEC_OMP_INTEROP. (struct gfc_omp_namelist): Add interop items to union. (gfc_free_omp_namelist): Add boolean arg. * match.cc (gfc_free_omp_namelist): Update to free interop union members. * match.h (gfc_match_omp_interop): New. * openmp.cc (gfc_omp_directives): Uncomment 'interop' entry. (gfc_free_omp_clauses, gfc_match_omp_allocate, gfc_match_omp_flush, gfc_match_omp_clause_reduction): Update call. (enum omp_mask2): Add OMP_CLAUSE_{INIT,USE,DESTROY}. (OMP_INTEROP_CLAUSES): Use it. (gfc_match_omp_clauses): Match those clauses. (gfc_match_omp_prefer_type, gfc_match_omp_init, gfc_match_omp_interop): New. (resolve_omp_clauses): Handle interop clauses. (omp_code_to_statement): Add ST_OMP_INTEROP. (gfc_resolve_omp_directive): Add EXEC_OMP_INTEROP. * parse.cc (decode_omp_directive): Parse 'interop' directive. (next_statement, gfc_ascii_statement): Handle ST_OMP_INTEROP. * st.cc (gfc_free_statement): Likewise * resolve.cc (gfc_resolve_code): Handle EXEC_OMP_INTEROP. * trans.cc (trans_code): Likewise. * trans-openmp.cc (gfc_trans_omp_directive): Print 'sorry' for EXEC_OMP_INTEROP. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/interop-1.f90: New test. * gfortran.dg/gomp/interop-2.f90: New test. * gfortran.dg/gomp/interop-3.f90: New test. gcc/fortran/dump-parse-tree.cc | 61 +
Re: [patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines
Hi Sandra, thanks for your comments. Sandra Loosemore wrote: Stepping back to consider this from a higher-level perspective, shouldn't the interface documented in the GCC manual reflect what GCC implements, rather than what the spec says that is explicitly *not* what is implemented? Or is the way you have documented this consistent with the way other libgomp features that don't strictly conform to the spec have already been documented? The idea of the implementation is to be 100% compatible to the OpenMP specification in terms of usage – but to deviate in terms of the implied ABI. The issue is really that the specification is more explicit than it should be - but it is clear why it is such: It is much easier and more readable to write: 'subroutine f(x); integer :: x' — instead of stating that "subroutine f" exists and takes "x" as argument which accepts default-kind integers. But the first version automatically implies that it is not "subroutine f(x) BIND(C)" and not "integer, VALUE :: x". However, I want to use bind(C) and value in GCC. For a user that includes the omp_lib module (or omp_lib.h header), the difference is not visible. * * * I personally dislike it a lot if vendor documentation of a specific standard function deviates in declaration, semantic or accepted arguments without telling me that it is modified. That's the reason I mentioned it in the previously attached patch. However, as it only affects the ABI – and does not affect users (unless they really care about the internal decl), maybe just not mentioning the differences is better? [RFC] Thus, the question is whether it should be stated in the manual or not? (Removed completely or kept commented out?) AsI wrote in the original email: "PS: I am not 100% sure whether adding the implementation detail makes sense or not." * * * In the attached patch, I commented it out in the .texi but left it there. * * * +the name matches the name of the named constant with the @code{omp_ipr_} +prefix removed. That should be @samp{omp_ipr_}, not @code markup. Hmm, I thought that non-white-space strings and in particular [A-Za-z0-9_]+ would be permissible for @code and only when going beyond one would need something else. +@samp{N/A} if this property is not available for the given foreign runtime. @code{"N/A"}, I think. (It's a string literal, right?) Well, the result of the function call is a pointer to the string N, /, A, \0 – and not to ", N, /, A, ", \0. And while the code indeed uses "N/A" it could also do res[0] = 'N'; res[1] = '/', … Thus, I think @code{"N/A"} (with quotation marks) is slightly misleading. — I am happy to use @code{N/A} instead of @samp{N/A}, if that seems to be more appropriate, but I am not so happy about @code{"N/A"}. * * * I know the libgomp manual uses different formatting conventions than the GCC manual or other Texinfo manuals. Have you inspected the formatted output to make sure it's what you expect and consistent with the rest of the document? It looked okay when glancing over the result in info, PDF and HTML format, right now and also when when I posted the previous patch. New is that I don't explicitly line break lines in the interface. Doing so lead before to odd very short lines in the HTML version and possible double breaks in the 'info' file if the 'info' line was a bit shorter than what was anticipated in the .texi file. As the result of the automatic line breaks looks reasonable, I used it as such. Remarks: * '@code{ abc}' leads to an indentation in some of the output formats but not in all; thus, I have not used it. Some but not all existing code uses it. — Using '@ @ @code' would work, but is ugly and not really needed, either. * I think we could consider updating the style eventually to be consistent with GCC's style (and move there slowly and step wise). * Regarding 'abc -- def' vs. 'abc---def', that's a Europeanism. To quote the "Oxford Guide to Style": "OUP [Oxford University Press] and most US publishers use the unspaced (non-touching) em rule as a parenthetical dash; other British publishers use the en rule with space either side." Tobias PS: I have partially updated the patch + attached it, but it is not yet fully updated; also because we have not yet settled on the items above. libgomp.texi: Document OpenMP's Interoperability Routines libgomp/ChangeLog: * libgomp.texi (Interoperability Routines): Add. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Document that depobj_list may be omitted in C++ and Fortran. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index fe25d879788..5605d522216 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -656,7 +656,7 @@ specification in version 5.2. * Lock Routines:: * Timing Routines:: * Event Routine:: -@c * Interoperability Routines:: +* Interoperability Routines:: * Memory Management Routines:: @c * Tool Control Routine:: * Envi
Re: [PATCH] Libquadmath: update doc for some constants
Hi FX, FX Coudert wrote: Give it’s a doc patch, I think it might fall under the obvious rule, and will commit in a week if there is no objection. The patch clearly fixes a bug in the current specification and is fine, I just wonder … * libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_2_PIq): Fix description of these constants. diff --git a/libquadmath/libquadmath.texi b/libquadmath/libquadmath.texi index dc2a9ff374b..ce4accf6421 100644 --- a/libquadmath/libquadmath.texi +++ b/libquadmath/libquadmath.texi … @item @code{M_PI_2q}: pi divided by two @item @code{M_PI_4q}: pi divided by four @item @code{M_1_PIq}: one over pi -@item @code{M_2_PIq}: one over two pi +@item @code{M_2_PIq}: two over pi @item @code{M_2_SQRTPIq}: two over square root of pi @item @code{M_SQRT2q}: square root of 2 @item @code{M_SQRT1_2q}: one over square root of 2 ... whether we should change the "over" which somehow sounds odd. "two divided by pi" sounds better to me than "two over pi". I do note, however, that the following documentation uses a slightly different wording: "M_2_PI -Two times the reciprocal of pi." https://www.gnu.org/software/libc/manual/html_node/Mathematical-Constants.html Hence, while I am fine with the change, I think we should replace the "over" wording (multiple times) and move either to "divided by" or [(…times) the reciprocal of". Tobias
Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin
Hi Andrew, Andrew Stubbs: On 22/08/2024 19:26, Tobias Burnus wrote: (A) Any comments, suggestions regarding the patch in general and in particular the plugin/ related parts? The code all looks pretty reasonable to me. The header file conditional includes worry me though: it is adding complexity in a way that hurts maintainability, and looks like it might break somebody's hypothetical out-of-tree plugin. Is it not better for a plugin that supports interop to include omp.h itself? I do note that libgomp.h explicitly includes 'omp.h.in' – and later includes 'libgomp-plugin.h' and not omp.h. But I don't know why. It could be some build-related issue or because it replaces already the locking definition by its own? (Albeit it could still use 'omp.h' together with the current '#ifdef' protection.) Assuming that omp.h.in is only included as the locking-type dance is done – and not an actual build issues: I will try whether just including 'omp.h' in plugin/plugin-*.c and libgomp-plugin.c before libgomp-plugin.h works. For libgomp.h, it is already included (and then used by target.c). * * * (B) RFC: The *stream* *creation* (hsa_queue_t, cudaStream_t/hipStream_t) functions have tons of options. Thus: ... (ii) Should the user be able to tweak the values? I mean, the user could say:** 'prefer_type({fr("cuda"), attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'. Do we want to permit this? If yes, which of the values should be changeable? Is there any prior art for this? It looks like it could be added in future, without breaking backward compatibility, so I say "no" (at least for now). There is no real prior art as the 'attr' is a very new feature (voted in in the about two months ago); I think it was mainly proposed for 'sycl' to specify an 'in-order' queue, which is a commonly what needed, but the default in sycl is an 'out-of-order' queue. In any case, it seems as if they intent to provide either type of queue. Still, if there is a sensible attribute to set, I think it makes sense to actually add it – and 'ompx_gnu_' should avoid interoperability issues. But as the feature is supported code wise, adding an attribute only requires changing two files: The plugin-.c and libgomp.texi, i.e. that's simple and quick. Tobias
[patch] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn
This patch comes on top of "[patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines", https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661314.html But it documents the code added at "[patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin", https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html As remarked there: While the code in the plugin should handle the advertised foreign runtimes (cuda, cuda_driver, hip, hsa) correctly, it has not been extensively been tested and it only becomes real available once the 'interop' directive has been implemented in the compiler itself. Tobias libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn libgomp/ChangeLog: * libgomp.texi (omp_get_interop_int, omp_get_interop_str, omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to Offload-Target Specifics. (Offload-Target Specifics): Document the supported OpenMP interop types. libgomp/libgomp.texi | 118 +-- 1 file changed, 114 insertions(+), 4 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index b36b58b6d10..9d76948812a 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -2980,7 +2980,7 @@ not affect the usage of the function when GCC's @code{omp_lib} module or @item @emph{See also}: @ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} -@c @ref{Offload-Target Specifics} +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2, @@ -3026,7 +3026,7 @@ not affect the usage of the function when GCC's @code{omp_lib} module or @item @emph{See also}: @ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} -@c @ref{Offload-Target Specifics} +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3, @@ -3071,7 +3071,7 @@ affect the usage of the function when GCC's @code{omp_lib} module or @item @emph{See also}: @ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc} -@c @ref{Offload-Target Specifics} +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4, @@ -3155,7 +3155,7 @@ affect the usage of the function when GCC's @code{omp_lib} module or @item @emph{See also}: @ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name} -@c @ref{Offload-Target Specifics} +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6, @@ -6747,6 +6747,10 @@ The following sections present notes on the offload-target specifics @node AMD Radeon @section AMD Radeon (GCN) +@menu +* Foreign-runtime support for AMD GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item work item (thread) @@ -6816,11 +6820,58 @@ The implementation remark: pool is exhausted. @end itemize +@node Foreign-runtime support for AMD GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs + +An interoperability object of OpenMP @code{interop} type can be obtained using +the @code{interop} directive; supported as foreign runtimes are HIP +(C++ Heterogeneous-Compute Interface for Portability) and HSA (Heterogeneous +System Architecture). If no @code{prefer_type} argument has been specified, +HIP is used. + +The following properties can then be extracted using the @ref{Interoperability +Routines}. Each listed property name has an associated named constant, +consisting of @code{omp_ipr_} followed by the property name. The following +table uses ``@emph{int}'', ``@emph{str}'' and ``@emph{ptr}'' to denote the +routine to be used to obtain the property value. + +Available properties for an HIP interop object: +@multitable @columnfractions .30 .30 .30 +@headitem Property @tab data type@tab value (if constant) +@item @code{fr_id} @tab @samp{omp_interop_fr_t} @emph{(int)} @tab @samp{omp_fr_hip} +@item @code{fr_name}@tab @samp{const char *} @emph{(str)} @tab @samp{hip} +@item @code{vendor} @tab @samp{int} @emph{(int)} @tab @samp{1} +@item @code{vendor_name}@tab @samp{const char *} @emph{(str)} @tab @samp{amd} +@item @code{device_num} @tab @samp{int} @emph{(int)} @tab +@item @code{platform} @tab N/A @tab +@item @code{device} @tab @samp{hipDevice_t} @emph{(int)} @tab +@item @code{device_context} @tab @samp{hipCtx_t} @emph{(ptr)} @tab +@item @code{targetsync} @tab @samp{hipStream_t} @emph{(ptr)} @tab +@end multitable + +Available properties for an HSA interop object: +@multitable @columnfractions .30 .30 .30 +@headitem Property @tab data type
[patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines
Minor update, mainly because of the 'optional' changes in v3 of the patch https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661313.html The 'optional' affects the omp_get_interop_{int,ptr,str} but also omp_target_memcpy_async, omp_target_memcpy_rect_async got a few words. Additionally, the returned string of omp_get_interop_type_desc is now better described (in GCC it is the C/C++ type decl as string or "N/A" or NULL). And a couple of notes about calling the routines from inside a non-host target region were added. Tobias Burnus: Add documentation for OpenMP's interoperability routines. This obviously, depends on the actual implementation patch, posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html (albeit I will post a v2 in a moment). I am sure there will be comments, suggestions and remarks :-) Tobias PS: I am not 100% sure whether adding the implementation detail makes sense or not. Tobiaslibgomp.texi: Document OpenMP's Interoperability Routines libgomp/ChangeLog: * libgomp.texi (Interoperability Routines): Add. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Document that depobj_list may be omitted in C++ and Fortran. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index fe25d879788..b36b58b6d10 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -656,7 +656,7 @@ specification in version 5.2. * Lock Routines:: * Timing Routines:: * Event Routine:: -@c * Interoperability Routines:: +* Interoperability Routines:: * Memory Management Routines:: @c * Tool Control Routine:: * Environment Display Routine:: @@ -2134,8 +2134,9 @@ to the destination device's @var{dst} address shifted by @var{dst_offset}. Task dependence is expressed by passing an array of depend objects to @var{depobj_list}, where the number of array elements is passed as @var{depobj_count}; if the count is zero, the @var{depobj_list} argument is -ignored. The routine returns zero if the copying process has successfully -been started and non-zero otherwise. +ignored. In C++ and Fortran, the @var{depobj_list} argument can also be +omitted in that case. The routine returns zero if the copying process has +successfully been started and non-zero otherwise. Running this routine in a @code{target} region except on the initial device is not supported. @@ -2255,7 +2256,8 @@ respectively. The offset per dimension to the first element to be copied is given by the @var{dst_offset} and @var{src_offset} arguments. Task dependence is expressed by passing an array of depend objects to @var{depobj_list}, where the number of array elements is passed as @var{depobj_count}; if the count is -zero, the @var{depobj_list} argument is ignored. The routine +zero, the @var{depobj_list} argument is ignored. In C++ and Fortran, the +@var{depobj_list} argument can also be omitted in that case. The routine returns zero on success and non-zero otherwise. The OpenMP specification only requires that @var{num_dims} up to three is @@ -2884,21 +2886,315 @@ event handle that has already been fulfilled is also undefined. -@c @node Interoperability Routines -@c @section Interoperability Routines -@c -@c Routines to obtain properties from an @code{omp_interop_t} object. -@c They have C linkage and do not throw exceptions. -@c -@c @menu -@c * omp_get_num_interop_properties:: -@c * omp_get_interop_int:: -@c * omp_get_interop_ptr:: -@c * omp_get_interop_str:: -@c * omp_get_interop_name:: -@c * omp_get_interop_type_desc:: -@c * omp_get_interop_rc_desc:: -@c @end menu +@node Interoperability Routines +@section Interoperability Routines + +Routines to obtain properties from an object of OpenMP interop type. +They have C linkage and do not throw exceptions. + +@menu +* omp_get_num_interop_properties:: Get the number of implementation-specific properties +* omp_get_interop_int:: Obtain integer-valued interoperability property +* omp_get_interop_ptr:: Obtain pointer-valued interoperability property +* omp_get_interop_str:: Obtain string-valued interoperability property +* omp_get_interop_name:: Obtain the name of an interop_property value as string +* omp_get_interop_type_desc:: Obtain type and description to an interop_property +* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code +@end menu + + + +@node omp_get_num_interop_properties +@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties +@table @asis +@item @emph{Description}: +The @code{omp_get_num_interop_properties} function returns the number of +implementation-defined interoperability properties available for the passed +@var{interop}, extending the OpenMP-defined properties. The available OpenMP +interop_property-type values range from @code{omp_ipr_first} to the value +returned by @code{omp_get_num_interop_properties} minus one. + +No implementation-defined properties are currently def
Re: [patch][v3] libgomp: Add interop types and routines to OpenMP's headers and module
v3: Changes: (A) The 'ret_code' arguments of omp_get_interop_{int,ptr,str} are actually 'optional'. That's something that got lost in at some point between OpenMP 5.2 and TR13 (I filed OpenMP spec Issue #4165 for it). When adding it, I noticed that two '…_async' function lacked the '= NULL' for C++, permitting to omit the argument. — For my C and Fortran testcases, I added a test with NULL for C and omitted the argument for Fortran. I also changed the C code such that it also compiles with C++ and added a check that the omitted argument is handled correctly. (B) Fixed a few libgomp/target.c issues, which sneaked in due to the wip patch for the libgomp plugin patch, posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html (among others, it also contained some spurious spaces). Build and regtested on x86-64-gnu-linux (w/o offloading configured). Any additional comments, suggestions, remarks? Andre Vehreschild wrote: […] First, Thanks for your comments. However, regarding: +omp_intptr_t Do I get this correct, that omp_intptr_t is a pointer to an integer? No 'intptr_t' is a (signed) integer type which is has (at least) the size of a pointer; in Fortran, that's 'integer(c_intptr_t)'. And 'omp_intptr_t' is just a typedef for 'intptr_t'. [BTW: I don't know why 'intptr_t' was used and not, e.g., int64_t or just 'int'.] Tobiaslibgomp: Add interop types and routines to OpenMP's headers and module This commit adds OpenMP 5.1+'s interop enumeration, type and routine declarations to the C/C++ header file and, new in OpenMP TR13, also to the Fortran module and omp_lib.h header file. While a stub implementation is provided, only with foreign runtime support by the libgomp GPU plugins and with the 'interop' directive, this becomes really useful. libgomp/ChangeLog: * fortran.c (omp_get_interop_str_, omp_get_interop_name_, omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add. * libgomp.map (GOMP_5.1.3): New; add interop routines. * omp.h.in: Add interop typedefs, enum and prototypes. (__GOMP_DEFAULT_NULL): Define. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Use it for the optional depend argument. * omp_lib.f90.in: Add paramters and interfaces for interop. * omp_lib.h.in: Likewise; move F90 '&' to column 81 for -ffree-length-80. * target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * config/gcn/target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * config/nvptx/target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * testsuite/libgomp.c-c++-common/interop-routines-1.c: New test. * testsuite/libgomp.c-c++-common/interop-routines-2.c: New test. * testsuite/libgomp.fortran/interop-routines-1.F90: New test. * testsuite/libgomp.fortran/interop-routines-2.F90: New test. * testsuite/libgomp.fortran/interop-routines-3.F: New test. * testsuite/libgomp.fortran/interop-routines-4.F: New test. * testsuite/libgomp.fortran/interop-routines-5.F: New test. * testsuite/libgomp.fortran/interop-routines-6.F: New test. * testsuite/libgomp.fortran/interop-routines-7.F90: New test. libgomp/config/gcn/target.c| 105 ++ libgomp/config/nvptx/target.c | 105 ++ libgomp/fortran.c | 41 +++ libgomp/libgomp.map| 15 + libgomp/omp.h.in | 78 - libgomp/omp_lib.f90.in | 99 ++ libgomp/omp_lib.h.in | 170 -- libgomp/target.c | 110 +++ .../libgomp.c-c++-common/interop-routines-1.c | 287 + .../libgomp.c-c++-common/interop-routines-2.c | 354 + .../libgomp.fortran/interop-routines-1.F90 | 236 ++ .../libgomp.fortran/interop-routines-2.F90 | 3 + .../testsuite/libgomp.fortran/interop-routines-3.F | 2 + .../testsuite/libgomp.fortran/interop-routines-4.F | 4 + .../testsuite/libgomp.fortran/interop-routines-5.F | 4 + .../testsuite/libgomp.fortran/interop-routines-6.F | 4 + .../libgomp.fortran/interop-routines-7.F90 | 290 + 17 files changed, 1883 insertions(+), 24 deletions(-) diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c index 9cafea4e2cc..f7fa6aa6396 100644 --- a/libgomp/config/gcn/target.c +++ b/libgomp/config/gcn/target.c @@ -185,3 +185,108 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs, (void) depend; __builtin_unreachable (); } + +
[patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin
This patch adds OpenMP's interop support to the libgomp plugins (nvptx: cuda, cuda_driver, hip; gcn: hip, hsa).* [The idea is that the user can ask OpenMP to return a foreign-runtime handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device number – and to create a stream (CUstream, hipStream_t, cudaStream_t, hsa_queue_t), where OpenMP can take care of dependencies, .e.g, via the 'depend' clause.] The attached patch comes on top of the interop routine patch, https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661118.html (and the associated .texi patch, https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661072.html ). The patch is more a WIP/RFC patch than a final patch as it is currently not wired up: while 'GOMP_interop' can be called manually, the proper way will be OpenMP's 'interop' directive, currently unimplemented. Hence, this patch is not extensively tested, does not include testcases, and target.c's GOMP_interop will surely change to handle all clauses. But except that target.c's GOMP_interop will change, the rest of the patch should be be rather solid – and could in principle be applied. Therefore: (A) Any comments, suggestions regarding the patch in general and in particular the plugin/ related parts? (B) RFC: The *stream* *creation* (hsa_queue_t, cudaStream_t/hipStream_t) functions have tons of options. Thus: (i) Does the chosen size/flags argument for the stream/queue generation for GCN/HIP/CUDA make sense? – Or are other values that are more sensible? (ii) Should the user be able to tweak the values? I mean, the user could say:** 'prefer_type({fr("cuda"), attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'. Do we want to permit this? If yes, which of the values should be changeable? Tobias (*) For Nvidia, HIP is just a thin wrapper of defines, typedefs and inline functions around CUDA. Thus, hip, cuda and cuda_driver are effectively all the same. / The HSA is a new proposal that is currently added additional-definition document. (OpenMP spec Issue #4023.) (**) The used syntax and in particular 'attr' are new in OpenMP 6.0 (new in TR13). Note that attr only takes string literals [while 'fr' takes strings and (6.0) identifiers ["omp_ifr_cuda"] or constant integer expressions (5.1)]. libgomp: Add OpenMP interop support to nvptx + gcn plugin FIXME/NOTE: target.c's GOMP_interop is a stub, sufficient for some initial testing, but not sufficient to implemement 'omp interop'. However, the plugin side should be feature complete, except for possible extensions. This adds interop support to the libgomp plugins; to the gcn one, it adds HSA and HIP and, to the nvptx one, it adds CUDA, CUDA_DRIVER and HIP. libgomp/ChangeLog: * libgomp-plugin.h: Include 'omp.h.in' if _LIBGOMP_PLUGIN_INCLUDE is set; define the following only if _LIBGOMP_OMP_LOCK_DEFINED is set (either via libgomp.h or when _LIBGOMP_PLUGIN_INCLUDE is set). (struct interop_obj_t): New. (GOMP_OFFLOAD_get_interop, GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): Add prototype. * libgomp.h: Move 'omp.h.in' inclusion to the top. (struct gomp_device_descr): Add function pointers for interop. * libgomp.map (GOMP_5.1.3): Add GOMP_interop. * libgomp_g.h (GOMP_interop): Add prototype. * target.c (GOMP_get_interop): New. (omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str omp_get_interop_type_desc): Add calls into the plugin. (gomp_load_plugin_for_device): DLSYM_OPT the new plugin functions. * plugin/plugin-gcn.c (_LIBGOMP_PLUGIN_INCLUDE): (hipError_t, hipCtx_t, hipStream): Add stub typedefs. (struct hip_runtime_fn_info): New. (struct agent_info): Add hsa_device_num. (hip_fns, hip_runtime_lib): New global vars. (init_environment_variables): Init hip_runtime_lib. (struct agent_id_data_t): New. (assign_agent_ids): Use it to set hsa_device_num. (init_hsa_context): Update call. (init_hip_runtime_functions, GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): New. * plugin/plugin-nvptx.c: Define _LIBGOMP_PLUGIN_INCLUDE before including libgomp-plugin.h. (GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str, GOMP_OFFLOAD_get_interop_type_desc): New. libgomp/libgomp-plugin.h | 37 libgomp/libgomp.h | 17 +- libgomp/libgomp.map | 1 + libgomp/libgomp_g.h | 2 + libgomp/plugin/plugin-gcn.c | 415 +- libgomp/plugin/plugin-nvptx.c | 282 libgomp/target.c | 134 +++--- 7 files changed, 848 insertions(+), 40 deletions(-) diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 0c9c28c65cf..ce1a83bc51e 100644 --- a/libgomp/libgomp-plugin.h +++ b/lib
[patch][v2a] libgomp: Add interop types and routines to OpenMP's headers and module
This is nearly identical to v2, except that I presumably used 'git add testsuite' when intending to use 'git add -u testsuite' in a last-minute change as it contained a bunch of unrelated test files … The only other change besides removing unrelated files is that for the generic part of omp_get_interop_type_desc, the data types ('int' for fr_id, vendor, device_num; const char*' for fr_name, vendor_name) are now returned in target.c while the specific types (for device, device_context, targetsync platform) will eventually be handled by the plugin function. Tobias Am 21.08.24 um 20:27 schrieb Tobias Burnus: Nearly identical to v1, except that I realized that OpenMP permits to call those functions also from target regions. Hence, those also got those functions, including a use of omp_irc_other to make clear why it will fail … In addition, two (nonhost) target-region test files were added. Comments, remarks, suggestions before I commit it? Otherwise, the following still applies: This patch adds 'interop' to C/C++'s omp.h and Fortran's omp_lib.h and omp_lib module. The implementation should match OpenMP 5.1 (which added interop) and also TR13; the Fortran routine support is new in TR13. It also adds 'hsa' as foreign object enum/paramter, which is currently being added to the additional-definitions document. * * * The routine interface does not exactly match the OpenMP spec as some VALUE and BIND(C) and one c_int has been used to reduce pointless differences between OpenMP and C/C++. This shouldn't affect the usage as almost no user will worries about the API used for a procedure reference. But if a user defines the routine interface him-/herself, this will fail. (But why should (s)he? There is 'omp_lib.h' and the 'omp_lib' module, after all – and several items in those files are implementation defined.) On the C/C++ side, there are also some differences (at least with TR13) with regards to unsigned vs. signed and to enum (of size __UINTPTR_T__) vs. 'typdef (u)intptr_t', but they shouldn't matter either (effectively same API) – and, again, there is a omp.h, which any sensible user should use. * * * While there is a stub implementation for the routines, to make them really useful, two things are missing: Support for the 'interop' directive in the compiler itself (+ a libgomp function for it) and supporting some foreign run time types in the libgomp plugin. Also missing is the documentation of the added routines in libgomp.texi. All of which will be added in later patches. Build + tested on x86-64-gnu-linux (with offloading enabled but that's not yet relevant). Cheers, Tobiaslibgomp: Add interop types and routines to OpenMP's headers and module This commit adds OpenMP 5.1+'s interop enumeration, type and routine declarations to the C/C++ header file and, new in OpenMP TR13, also to the Fortran module and omp_lib.h header file. While a stub implementation is provided, only with foreign runtime support by the libgomp GPU plugins and with the 'interop' directive, this becomes really useful. libgomp/ChangeLog: * fortran.c (omp_get_interop_str_, omp_get_interop_name_, omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add. * libgomp.map (GOMP_5.1.3): New; add interop routines. * omp.h.in: Add interop typedefs, enum and prototypes. * omp_lib.f90.in: Add paramters and interfaces for interop. * omp_lib.h.in: Likewise; move F90 '&' to column 81 for -ffree-length-80. * target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * config/gcn/target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * config/nvptx/target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * testsuite/libgomp.c/interop-routines-1.c: New test. * testsuite/libgomp.c/interop-routines-2.c: New test. * testsuite/libgomp.fortran/interop-routines-1.F90: New test. * testsuite/libgomp.fortran/interop-routines-2.F90: New test. * testsuite/libgomp.fortran/interop-routines-3.F: New test. * testsuite/libgomp.fortran/interop-routines-4.F: New test. * testsuite/libgomp.fortran/interop-routines-5.F: New test. * testsuite/libgomp.fortran/interop-routines-6.F: New test. * testsuite/libgomp.fortran/interop-routines-7.F90: New test. libgomp/config/gcn/target.c| 99 +++ libgomp/config/nvptx/target.c | 99 +++ libgomp/fortran.c |
[patch] libgomp.texi: Document OpenMP's Interoperability Routines
Add documentation for OpenMP's interoperability routines. This obviously, depends on the actual implementation patch, posted at: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661035.html (albeit I will post a v2 in a moment). I am sure there will be comments, suggestions and remarks :-) Tobias PS: I am not 100% sure whether adding the implementation detail makes sense or not. libgomp.texi: Document OpenMP's Interoperability Routines libgomp/ChangeLog: * libgomp.texi (Interoperability Routines): Add. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index fe25d879788..ecc60882d72 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -656,7 +656,7 @@ specification in version 5.2. * Lock Routines:: * Timing Routines:: * Event Routine:: -@c * Interoperability Routines:: +* Interoperability Routines:: * Memory Management Routines:: @c * Tool Control Routine:: * Environment Display Routine:: @@ -2884,21 +2884,294 @@ event handle that has already been fulfilled is also undefined. -@c @node Interoperability Routines -@c @section Interoperability Routines -@c -@c Routines to obtain properties from an @code{omp_interop_t} object. -@c They have C linkage and do not throw exceptions. -@c -@c @menu -@c * omp_get_num_interop_properties:: -@c * omp_get_interop_int:: -@c * omp_get_interop_ptr:: -@c * omp_get_interop_str:: -@c * omp_get_interop_name:: -@c * omp_get_interop_type_desc:: -@c * omp_get_interop_rc_desc:: -@c @end menu +@node Interoperability Routines +@section Interoperability Routines + +Routines to obtain properties from an object of OpenMP interop type. +They have C linkage and do not throw exceptions. + +@menu +* omp_get_num_interop_properties:: Get the number of implementation-specific properties +* omp_get_interop_int:: Obtain integer-valued interoperability property +* omp_get_interop_ptr:: Obtain pointer-valued interoperability property +* omp_get_interop_str:: Obtain string-valued interoperability property +* omp_get_interop_name:: Obtain the name of an interop_property value as string +* omp_get_interop_type_desc:: Obtain type and description to an interop_property +* omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code +@end menu + + + +@node omp_get_num_interop_properties +@subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties +@table @asis +@item @emph{Description}: +The @code{omp_get_num_interop_properties} function returns the number of +implementation-defined interoperability properties available for the passed +@var{interop}, extending the OpenMP-defined properties. The available OpenMP +interop_property-type values range from @code{omp_ipr_first} to the value +returned by @code{omp_get_num_interop_properties} minus one. + +No implementation-defined properties are currently defined in GCC. + +Implementation remark: In GCC, the Fortran interface differs from the one shown +below: the function has C binding, @var{interop} is passed by value and an +integer of @code{c_int} kind is returnd, permitting to have the same ABI as the +C function. This does not affect the usage of the function when GCC's +@code{omp_lib} module or @code{omp_lib.h} header is used. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{int omp_get_num_interop_properties(const omp_interop_t interop)} +@end multitable + +@item @emph{Fortran}: +@multitable @columnfractions .20 .80 +@item @emph{Interface}: @tab @code{integer function omp_get_num_interop_properties(interop)} +@item @tab @code{integer(omp_interop_kind), intent(in) :: interop} +@end multitable + +@item @emph{See also}: +@ref{omp_get_interop_name}, @ref{omp_get_interop_type_desc} + +@item @emph{Reference}: +@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.1, +@uref{https://www.openmp.org, OpenMP specification TR13}, Section 26.1 +@end table + + + +@node omp_get_interop_int +@subsection @code{omp_get_interop_int} -- Obtain integer-valued interoperability property +@table @asis +@item @emph{Description}: +The @code{omp_get_interop_int} function returns the integer value associated +with the @var{property_id} interoperability property of the passed @var{interop} +object. If successful, @var{ret_code} is set to @code{omp_irc_success}. + +Implementation remark: In GCC, the Fortran interface differs from the one shown +below: the function has C binding and @var{interop} and @var{property_id} are +passed by value, permitting to have the same ABI as the C function. This does +not affect the usage of the function when GCC's @code{omp_lib} module or +@code{omp_lib.h} header is used. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop, + omp_interop_property_t property_id, int *ret_code)} +@end multitable + +@item @emph{Fortran
[patch] libgomp: Add interop types and routines to OpenMP's headers and module
This patch adds 'interop' to C/C++'s omp.h and Fortran's omp_lib.h and omp_lib module. The implementation should match OpenMP 5.1 (which added interop) and also TR13; the Fortran routine support is new in TR13. It also adds 'hsa' as foreign object enum/paramter, which is currently being added to the additional-definitions document. * * * The routine interface does not exactly match the OpenMP spec as some VALUE and BIND(C) and one c_int has been used to reduce pointless differences between OpenMP and C/C++. This shouldn't affect the usage as almost no user will worries about the API used for a procedure reference. But if a user defines the routine interface him-/herself, this will fail. (But why should (s)he? There is 'omp_lib.h' and the 'omp_lib' module, after all – and several items in those files are implementation defined.) On the C/C++ side, there are also some differences (at least with TR13) with regards to unsigned vs. signed and to enum (of size __UINTPTR_T__) vs. 'typdef (u)intptr_t', but they shouldn't matter either (effectively same API) – and, again, there is a omp.h, which any sensible user should use. * * * While there is a stub implementation for the routines, to make them really useful, two things are missing: Support for the 'interop' directive in the compiler itself (+ a libgomp function for it) and supporting some foreign run time types in the libgomp plugin. Also missing is the documentation of the added routines in libgomp.texi. All of which will be added in later patches. Build + tested on x86-64-gnu-linux (with offloading enabled but that's not yet relevant). Comments, remarks, suggestions before I commit it? Tobias libgomp: Add interop types and routines to OpenMP's headers and module This commit adds OpenMP 5.1+'s interop enumeration, type and routine declarations to the C/C++ header file and, new in OpenMP TR13, also to the Fortran module and omp_lib.h header file. While a stub implementation is provided, only with foreign runtime support by the libgomp GPU plugins and with the 'interop' directive, this becomes really useful. libgomp/ChangeLog: * fortran.c (omp_get_interop_str_, omp_get_interop_name_, omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add. * libgomp.map (GOMP_5.1.3): New; add interop routines. * omp.h.in: Add interop typedefs, enum and prototypes. * omp_lib.f90.in: Add paramters and interfaces for interop. * omp_lib.h.in: Likewise; move F90 '&' to column 81 for -ffree-length-80. * target.c (omp_get_num_interop_properties, omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name, omp_get_interop_type_desc, omp_get_interop_rc_desc): Add. * testsuite/libgomp.c/interop-routines-1.c: New test. * testsuite/libgomp.fortran/interop-routines-1.F90: New test. * testsuite/libgomp.fortran/interop-routines-2.F90: New test. * testsuite/libgomp.fortran/interop-routines-3.F: New test. * testsuite/libgomp.fortran/interop-routines-4.F: New test. * testsuite/libgomp.fortran/interop-routines-5.F: New test. * testsuite/libgomp.fortran/interop-routines-6.F: New test. libgomp/fortran.c | 41 libgomp/libgomp.map| 15 ++ libgomp/omp.h.in | 69 ++ libgomp/omp_lib.f90.in | 99 + libgomp/omp_lib.h.in | 167 -- libgomp/target.c | 91 libgomp/testsuite/libgomp.c/interop-routines-1.c | 246 + .../libgomp.fortran/interop-routines-1.F90 | 222 +++ .../libgomp.fortran/interop-routines-2.F90 | 3 + .../testsuite/libgomp.fortran/interop-routines-3.F | 2 + .../testsuite/libgomp.fortran/interop-routines-4.F | 4 + .../testsuite/libgomp.fortran/interop-routines-5.F | 4 + .../testsuite/libgomp.fortran/interop-routines-6.F | 4 + 13 files changed, 945 insertions(+), 22 deletions(-) diff --git a/libgomp/fortran.c b/libgomp/fortran.c index cfbea32b022..b62a3f29916 100644 --- a/libgomp/fortran.c +++ b/libgomp/fortran.c @@ -102,6 +102,10 @@ ialias_redirect (omp_set_default_allocator) ialias_redirect (omp_get_default_allocator) ialias_redirect (omp_display_env) ialias_redirect (omp_fulfill_event) +ialias_redirect (omp_get_interop_str) +ialias_redirect (omp_get_interop_name) +ialias_redirect (omp_get_interop_type_desc) +ialias_redirect (omp_get_interop_rc_desc) #endif #ifndef LIBGOMP_GNU_SYMBOL_VERSIONING @@ -807,4 +811,41 @@ omp_display_env_8_ (const int64_t *verbose) omp_display_env (!!*verbose); } +void +omp_get_interop_str_ (const char **res, size_t *res_len, + const omp_interop_t interop, + omp_interop_property_t property_id, + omp_interop_rc_t *ret_code) +{ + *res = omp_get_interop_str (interop, property_id, ret_code); + *res_len = *res ? strlen (*res) : 0; +} + +void +omp_get_inter
Re: [PATCH v3 2/7] OpenMP: middle-end support for dispatch + adjust_args
Paul-Antoine Arras wrote: This patch adds middle-end support for the `dispatch` construct and the `adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and `gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in emitting a call to `gomp_get_mapped_ptr` for the adequate device. ... * gimplify.h (omp_has_novariants): Declare. (omp_has_nocontext): Declare. As those two functions are only used in gimplify.cc, please make them 'static' and remove them from gimplify.h. * * * I have a testcase which is rejected with the bogus: 17 | !$omp end dispatch | 1 Error: Unclassifiable OpenMP directive at (1) That's at least valid in OpenMP 6.0 previews as those have: "For a dispatch directive, the paired 'end' directive is optional." In 5.2, it is implied via "3.1 Directive Format" and that 'dispatch' has "Association: block (function dispatch structured block)" Note: That 'nowait' is an 'end-clause' and may also appear as '!$omp end dispatch nowait'. (but either at 'dispatch' or at 'end dispatch'; the current code should be able to handle this.) * * * But the main reason that I created the testcase was a comment which looked wrong in gimplify_omp_dispatch – and indeed, the attached testcase gives an ICE: internal compiler error: in gimplify_omp_dispatch, at gimplify.cc:18064 See attached Fortran testcase + comment below at gimplify_omp_dispatch. * * * --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc ... @@ -4052,6 +4053,7 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) /* Gimplify the function arguments. */ if (nargs > 0) { +tree device_num = NULL_TREE; Indentation issue: Indented by 4 instead of 6 spaces. @@ -4062,8 +4064,111 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) ... + if (flag_openmp && EXPR_P (CALL_EXPR_FN (*expr_p)) + && DECL_P (TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0)) + && (adjust_args_list = lookup_attribute ( + "omp declare variant variant adjust_args", + DECL_ATTRIBUTES ( + TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0 + != NULL_TREE) + { ... + if (gimplify_omp_ctxp != NULL + && gimplify_omp_ctxp->code == OMP_DISPATCH) + { The OpenMP spec only supports append_args/adjust_args "when a specified function variant is selected for replacement in the context of a function *dispatch* structured block. Thus, IMHO, you can merge the two if conditions. + for (tree c = gimplify_omp_ctxp->clauses; c; + c = TREE_CHAIN (c)) + { + if (OMP_CLAUSE_CODE (c) + == OMP_CLAUSE_IS_DEVICE_PTR) + { + tree decl1 = DECL_NAME (OMP_CLAUSE_DECL (c)); + tree decl2 + = tree_strip_nop_conversions (*arg_p); + if (TREE_CODE (decl2) == ADDR_EXPR) + decl2 = TREE_OPERAND (decl2, 0); + gcc_assert (TREE_CODE (decl2) == VAR_DECL + || TREE_CODE (decl2) + == PARM_DECL); + decl2 = DECL_NAME (decl2); + if (decl1 == decl2) + { + is_device_ptr = true; + break; + } + } + else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE) + device_num = OMP_CLAUSE_OPERAND (c, 0); + } Assume(*) you have: #pragma omp dispatch is_device_ptr(p) device_num(6) foo(p); If I read the code correctly, this will use the default device as the "break" will prevent finding the device clause. (* Or other way round, if new clauses are internally added at the beginning of the list.) + if (build_int_cst (integer_type_node, i) + == TREE_VALUE (arg)) I think if (wi::eq_p (i, tree_strip_any_location_wrapper ( TREE_VALUE (arg))) is better and avoids creating new tree values that might en up being unused. (I am assuming that TREE_CODE(TREE_VALUE (arg)) == INTEGER_CST, if not, some additional checks might be needed.) (The tree_strip_any_location_wrapper call, I have taken from integer_nonzerop (etc.) and it might not be needed.) + if (need_device_ptr && !is_device_ptr) + { + if (device_nu
Re: [PATCH v3 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces
Paul-Antoine Arras wrote: This patch introduces the OMP_DISPATCH tree node, as well as two new clauses `nocontext` and `novariants`. It defines/exposes interfaces that will be used in subsequent patches that add front-end and middle-end support, but nothing generates these nodes yet. LGTM - thanks! Tobias gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (dump_generic_node): Handle OMP_DISPATCH. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (omp_clause_code_name): Add "novariants" and "nocontext". * tree.def (OMP_DISPATCH): New. * tree.h (OMP_DISPATCH_BODY): New macro. (OMP_DISPATCH_CLAUSES): New macro. (OMP_CLAUSE_NOVARIANTS_EXPR): New macro. (OMP_CLAUSE_NOCONTEXT_EXPR): New macro. gcc/fortran/ChangeLog: * types.def (BT_FN_PTR_CONST_PTR_INT): Declare. --- gcc/builtin-types.def| 1 + gcc/fortran/types.def| 1 + gcc/omp-selectors.h | 1 + gcc/tree-core.h | 7 +++ gcc/tree-pretty-print.cc | 21 + gcc/tree.cc | 4 gcc/tree.def | 5 + gcc/tree.h | 7 +++ 8 files changed, 47 insertions(+) diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index c97d6bad1de..ef7aaf67d13 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, BT_FEXCEPT_T_PTR, DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT, BT_CONST_FEXCEPT_T_PTR, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, BT_UINT8) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR) diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def index 390cc9542f7..5047c8f816a 100644 --- a/gcc/fortran/types.def +++ b/gcc/fortran/types.def @@ -120,6 +120,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, BT_BOOL) DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE, BT_VOID, BT_PTR, BT_PTRMODE) DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR) diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h index c61808ec0ad..ef3ce9a449a 100644 --- a/gcc/omp-selectors.h +++ b/gcc/omp-selectors.h @@ -55,6 +55,7 @@ enum omp_ts_code { OMP_TRAIT_CONSTRUCT_PARALLEL, OMP_TRAIT_CONSTRUCT_FOR, OMP_TRAIT_CONSTRUCT_SIMD, + OMP_TRAIT_CONSTRUCT_DISPATCH, OMP_TRAIT_LAST, OMP_TRAIT_INVALID = -1 }; diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 27c569c7702..508f5c580d4 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -542,6 +542,13 @@ enum omp_clause_code { /* OpenACC clause: nohost. */ OMP_CLAUSE_NOHOST, + + /* OpenMP clause: novariants (scalar-expression). */ + OMP_CLAUSE_NOVARIANTS, + + /* OpenMP clause: nocontext (scalar-expression). */ + OMP_CLAUSE_NOCONTEXT, + }; #undef DEFTREESTRUCT diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 4bb946bb0e8..752a402e0d0 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) case OMP_CLAUSE_EXCLUSIVE: name = "exclusive"; goto print_remap; +case OMP_CLAUSE_NOVARIANTS: + pp_string (pp, "novariants"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause)); + dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags, +false); + pp_right_paren (pp); + break; +case OMP_CLAUSE_NOCONTEXT: + pp_string (pp, "nocontext"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause)); + dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags, +false); + pp_right_paren (pp); + break; case OMP_CLAUSE__LOOPTEMP_: name = "_looptemp_"; goto print_remap; @@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags); goto dump_omp_body; +case OMP_DISPATCH: + pp_string (pp, "#pragma omp dispatch"); + dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags); + goto dump_omp_body; + case OMP_SECTION: pp_string (pp, "#pra
[Patch] libgomp.texi: Update implementation status table for OpenMP TR13
Update for the very recently released TR13. Unsurprisingly, most item are still unimplemented. → https://www.openmp.org/specifications/ → Technical Report 13 Comments, suggestions, typo fixes? — If not, I will commit it later today. Tobias libgomp.texi: Update implementation status table for OpenMP TR13 libgomp/ChangeLog: * libgomp.texi (OpenMP Technical Report 13): Renamed from 'OpenMP Technical Report 12'; updated for TR13 changes. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index c6759dd03bc..96cc0e4baa8 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -169,7 +169,7 @@ See also @ref{OpenMP Implementation Status}. * OpenMP 5.0:: Feature completion status to 5.0 specification * OpenMP 5.1:: Feature completion status to 5.1 specification * OpenMP 5.2:: Feature completion status to 5.2 specification -* OpenMP Technical Report 12:: Feature completion status to second 6.0 preview +* OpenMP Technical Report 13:: Feature completion status to third 6.0 preview @end menu The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version} @@ -391,7 +391,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @item @code{destroy} clause with destroy-var argument on @code{depobj} @tab Y @tab @item Deprecation of no-argument @code{destroy} clause on @code{depobj} - @tab N @tab + @tab N/A @tab undeprecated in OpenMP 6 @item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab @item Deprecation of minus operator for reductions @tab N @tab @item Deprecation of separating @code{map} modifiers without comma @tab N @tab @@ -448,20 +448,24 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @end multitable -@node OpenMP Technical Report 12 -@section OpenMP Technical Report 12 +@node OpenMP Technical Report 13 +@section OpenMP Technical Report 13 -Technical Report (TR) 12 is the second preview for OpenMP 6.0. +Technical Report (TR) 13 is the third preview for OpenMP 6.0. @unnumberedsubsec New features listed in Appendix B of the OpenMP specification @multitable @columnfractions .60 .10 .25 -@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed +@item Features deprecated in versions 5.0, 5.1 and 5.2 were removed @tab N/A @tab Backward compatibility @item Full support for C23 was added @tab P @tab @item Full support for C++23 was added @tab P @tab +@item Full support for Fortran 2023 was added @tab P @tab @item @code{_ALL} suffix to the device-scope environment variables @tab P @tab Host device number wrongly accepted @item @code{num_threads} now accepts a list @tab N @tab +@item Abstract names added for @code{OMP_NUM_THREADS}, + @code{OMP_THREAD_LIMIT} and @code{OMP_TEAMS_THREAD_LIMIT} + @tab N @tab @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab @item Extension of @code{OMP_DEFAULT_DEVICE} and new @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab @@ -470,28 +474,51 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0. @tab Y @tab @item The OpenMP directive syntax was extended to include C 23 attribute specifiers @tab Y @tab +@item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab @item All inarguable clauses take now an optional Boolean argument @tab N @tab @item For Fortran, @emph{locator list} can be also function reference with data pointer result @tab N @tab @item Concept of @emph{assumed-size arrays} in C and C++ @tab N @tab @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab +@item Argument-free version of @code{depobj} including added @code{init} clause + @tab N @tab +@item Undeprecate omitting the argument to the @code{depend} clause of + the argument version of the @code{depend} construct @tab Y @tab @item For Fortran, atomic with BLOCK construct and, for C/C++, with unlimited curly braces supported @tab N @tab +@item For Fortran, atomic with pointer comparison @tab N @tab +@item For Fortran, atomic with enum and enumeration types @tab N @tab @item For Fortran, atomic compare with storing the comparison result @tab N @tab @item New @code{looprange} clause @tab N @tab -@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr} +@item For Fortran, handling polymorphic types in data-sharing-attribute + clauses @tab P @tab @code{private} not supported +@item For Fortran, rejecting polymorphic types in data-mapping clauses + @tab N @tab not diagnosed (and mostly unsupported) +@item New @code{taskgraph} construct including @emph{saved} modifier and + @code{replayable} clause @tab N @tab +@item @code{default} clause on the @code{target} directive @tab N @tab +@item Ref-count change for @code{use_device_ptr} and @code{use_device_addr} @tab N @tab @item Support for inductions @tab N @tab +@item Deprecation of the combiner expressio
[Patch] libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device (was: Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates)
Document -fno-builtin-omp_is_initial_device as discussed: Jakub Jelinek wrote: RFC: Should be document this new built-in some where? If so, where? As part of the routine description in libgomp.texi? Or in extend.texi (or even invoke.texi)? I think libgomp.texi in the omp_is_initial_device description, mention that the compiler folds it by default and that if that is undesirable, there is this option to use. Unless there are wording suggestions, I will commit it later today. Tobias libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device libgomp/ChangeLog: * libgomp.texi (omp_is_initial_device): Mention -fno-builtin-omp_is_initial_device and folding by default. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index c6759dd03bc..96cc0e4baa8 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -1754,6 +1754,10 @@ This function returns @code{true} if currently running on the host device, @code{false} otherwise. Here, @code{true} and @code{false} represent their language-specific counterparts. +Note that in GCC this value is already folded to a constant in the compiler; +compile with @option{-fno-builtin-omp_is_initial_device} if a run-time function +is desired. + @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
[committed] libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump (was: [r15-2799 Regression] FAIL: libgomp.c++/static-aggr-constructor-destructor-2.C scan-tree-dump-times optimiz
haochen.jiang wrote: FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized "__attribute__\\(\\([^\n\r]*omp declare target nohost" 1 FAIL: libgomp.c++/static-aggr-constructor-destructor-1.C scan-tree-dump-times optimized "void _GLOBAL__off_I_v1" 1 Those symbols are generated even with ENABLE_OFFLOADING == false, but in that case they are optimized way (as they should). With offloading, the pass removing them comes too late, but we should handle 'nohost' explicitly. Once done, the dump will be the same (no symbol). Until this implemented, we now do: To make this test pass, we now use 'target (!) offload_target_any' to separate the cases, even though offload_target_any does not completely match ENABLE_OFFLOADING.* Committed as r15-2814-ge3a6dec326a127 Tobias (* If you configured with --enable-offload-defaulted and have no offload binaries available or when you smuggle '-foffload=disable' to the commandline, ENABLE_OFFLOADING is true while offload_target_any is false.) commit e3a6dec326a127ad549246435b9d3835e9a32407 Author: Tobias Burnus Date: Thu Aug 8 10:42:25 2024 +0200 libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump In principle, the optimized dump should be the same on the host, but as 'nohost' is not handled, is is present. However when ENABLE_OFFLOADING is false, it is handled early enough to remove the function. libgomp/ChangeLog: * testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: Split scan-tree-dump into with and without target offload_target_any. * testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: Likewise. --- .../libgomp.c++/static-aggr-constructor-destructor-1.C | 15 --- .../libgomp.c++/static-aggr-constructor-destructor-2.C | 16 +--- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C index 403a071c0c0..b5aafc8cabc 100644 --- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C +++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C @@ -9,9 +9,18 @@ // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } } // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } } -// FIXME: should be '-not' not '-times' 1: -// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" } } -// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" } } + +// (A) No offloading configured: The symbols aren't present +// Caveat: They are present with -foffload=disable - or offloading +// configured but none of the optional offload packages/binaries installed. +// But the 'offload_target_any' check cannot distinguish those +// { dg-final { scan-tree-dump-not "void _GLOBAL__off_I_v1" "optimized" { target { ! offload_target_any } } } } +// { dg-final { scan-tree-dump-not "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" "optimized" { target { ! offload_target_any } } } } + +// (B) With offload configured (and compiling for an offload target) +// the symbols are present (missed optimization). Hence: FIXME. +// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_v1" 1 "optimized" { target offload_target_any } } } +// { dg-final { scan-tree-dump-times "__attribute__\\(\\(\[^\n\r]*omp declare target nohost" 1 "optimized" { target offload_target_any } } } // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump-not "omp_initial_device;" "optimized" { target offload_target_amdgcn } } } // { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "v1\\._x = 5;" "optimized" { target offload_target_amdgcn } } } diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C index 6dd4260a522..9652a721bbe 100644 --- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C +++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C @@ -9,9 +9,19 @@ // { dg-final { scan-tree-dump-not "omp_is_initial_device" "optimized" } } // { dg-final { scan-tree-dump-not "__omp_target_static_init_and_destruction" "optimized" } } -// FIXME: should be '-not' not '-times' 1: -// { dg-final { scan-tree-dump-times "void _GLOBAL__off_I_" 1 "optimized" } } -// { dg-final { sca
[committed] libgomp.c-c++-common/target-link-2.c: Fix test on multi-device systems (was: Re: [Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107])
Hi Thomas, Thomas Schwinge wrote: The new test case 'libgomp.c-c++-common/target-link-2.c' generally PASSes on one-GPU systems, but on a multi-GPU system (tested nvidia5): After having debugged it, it became glaringly obvious, but could otherwise be missed … The testcase checks that mapping an array – and then remapping a different stride works, but to see that it was really remapped, it changed the host value before. The issue was that it has to be changed back to the original value for the next device as the value checks expect always the same value. Committed as r15-2796-gaa689684d2bf58. Thanks for the report! Tobias PS: I first thought that maybe just: + #pragma omp target exit data map(release:arr[3:10]) device(dev) I was (and still am) torn between adding it (cleaner) or keeping it, as both have some merits for testing - and haven't cleaned up after the remapping. In any case, either testcase is fine (and should work).commit aa689684d2bf58d1b7e7938a1392e7a260276d14 Author: Tobias Burnus Date: Wed Aug 7 17:59:21 2024 +0200 libgomp.c-c++-common/target-link-2.c: Fix test on multi-device systems libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/target-link-2.c: Reset variable value to handle multi-device tests. --- libgomp/testsuite/libgomp.c-c++-common/target-link-2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c index 15da1656ebf..b64fbde70e3 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c @@ -54,6 +54,9 @@ int main() for (int i = 0; i < 10; i++) if (res[i] != (4 + i)*10) __builtin_abort (); + + for (int i = 0; i < 15; i++) /* Reset. */ + arr[i] /= 10; } return 0; }
Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates
Hi Jakub, for C/C++, -fno-builtin-omp_is_initial_device already disabled the expansion. I added it also for Fortran. Plus added a C and a Fortran testcase for the disable flag. * * * Wording wise, it failed before for Fortran with: f951: Warning: command-line option ‘-fno-builtin-omp_is_initial_device’ is valid for C/C++/ObjC/ObjC++ but not for Fortran f951: Warning: command-line option ‘-fbuiltin-omp_is_initial_device’ is valid for C/C++/ObjC/ObjC++ but not for Fortran (The latter is not quite true as all non "no-" ones are rejected for C/C++, e.g.: "cc1: error: unrecognized command-line option ‘-fbuiltin-omp_is_initial_device’"). Now all positive forms fail with: "f951: Error: unrecognized command-line option ‘-fbuiltin-omp_is_initial_device’", which should be fine and in line with C/C++. [RFC] The only real question is how to handle unknown -fno-builtin-* flags. C/C++ accepts them silently; Fortran did reject them before (see above) as unknown flag. And this patch does: f951: Warning: command-line option ‘-fno-builtin-nothing’ is not valid for Fortran for all but that single supported flag. * * * Jakub Jelinek wrote: As I wrote, I think there should be some option to override the omp_is_initial_device folding, e.g. for the case where one is compiling some library code which could be linked either way and so need to avoid folding omp_is_initial_device because we'll only know at runtime. Now done – already there for C/C++, but required the changes for Fortran. RFC: Should be document this new built-in some where? If so, where? As part of the routine description in libgomp.texi? Or in extend.texi (or even invoke.texi)? Maybe would be worth testing that omp_is_initial_device is not treated like a builtin in C++ in custom namespace, or as a static or non-static member function, or for C or Fortran as nested function. For C/C++, it uses the same mechanism (both_p = true) as all other builtins; thus, I just hope that it works there. For Fortran, this plus into gfc_get_extern_function_decl, i.e. that name appears as external declaration. While the user could mess around, it checks that it is a function and the return type is the expected on (i.e. logical). Thus, there shouldn't be any issue with nested functions. Tobias OpenMP: Constructors and destructors for "declare target" static aggregates This commit also compile-time expands (__builtin_)omp_is_initial_device for both Fortran and C/C++ (unless, -fno-builtin-omp_is_initial_device is used). But the main change is: This commit adds support for running constructors and destructors for static (file-scope) aggregates for C++ objects which are marked with "declare target" directives on OpenMP offload targets. Before this commit, space is allocated on the target for such aggregates, but nothing ever constructs them properly, so they end up zero-initialised. (See the new test static-aggr-constructor-destructor-3.C for a reason why running constructors on the target is preferable to e.g. constructing on the host and then copying the resulting object to the target.) 2024-08-07 Julian Brown Tobias Burnus gcc/ChangeLog: * builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version. * gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New. (gimple_fold_builtin): Call it. * omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define. * tree.cc (get_file_function_name): Support names for on-target constructor/destructor functions. gcc/cp/ * decl2.cc (tree-inline.h): Include. (static_init_fini_fns): Bump to four entries. Update comment. (start_objects, start_partial_init_fini_fn): Add 'omp_target' parameter. Support "declare target" decls. Update forward declaration. (emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for the created function. Support "declare target". (OMP_SSDF_IDENTIFIER): New macro. (partition_vars_for_init_fini): Support partitioning "declare target" variables also. (generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support "declare target" decls. (c_parse_final_cleanups): Support constructors/destructors on OpenMP offload targets. gcc/fortran/ChangeLog: * gfortran.h (gfc_option_t): Add disable_omp_is_initial_device. * lang.opt (fbuiltin-): Add. * options.cc (gfc_handle_option): Handle -fno-builtin-omp_is_initial_device. * f95-lang.cc (gfc_init_builtin_functions): Handle DEF_GOMP_BUILTIN_COMPILER. * trans-decl.cc (gfc_get_extern_function_decl): Add code to use DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'. libgomp/ChangeLog: * testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test. * testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test. * testsuite/libgomp.c++/static-aggr-constructor
[PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates
CCed Fortran because of the first item: This patch now uses (again like in v1) a builtin for 'omp_is_initial_device'; like in v2, it is compile-time evaluated, but this time (new!) it also handled the case that a user wrote that routine. Note: The omp_… namespace is owned by OpenMP, i.e. if it breaks for a user-defined function (when compiled with -fopenmp), it's the fault of the user. Otherwise, it is unchanged except for the following first suggestion. And while 'nohost' should be optimized (away on the host), that's deferred to a to-be-written follow-up patch. On Aug 1, 2024, Jakub Jelinek wrote: On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote: - char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32]; + tree name; ... I'd just use a single buffer here, char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER)) + 1 /* \0 */ + 32]; Done as proposed. Given that the Xeon PHI offloading is gone and fork offloading doesn't seem to be worked on, my preference would be __builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because that will actually help user code too. Done. And of course, it would be much better to figure out real nohost fix, because if we need to register a constructor which will just do nothing, it still wastes runtime. To be done in a follow-up patch. Comments, suggestions, concerns? Tobias PS: In principle, 'omp_get_num_devices()' would be a candidate for '-foffload=disable' (or not configured), but I am not sure how useful it is, especially as the decision whether offloading should be done is deferred to the link time. PPS: For OpenACC, there is already an optimization for the similar but more complex acc_on_device. But that one doesn't handle Fortran due to the different ABI. See https://gcc.gnu.org/PR116269 for details. OpenMP: Constructors and destructors for "declare target" static aggregates This commit also compile-time expands (__builtin_)omp_is_initial_device for both Fortran and C/C++. But the main change is: This commit adds support for running constructors and destructors for static (file-scope) aggregates for C++ objects which are marked with "declare target" directives on OpenMP offload targets. Before this commit, space is allocated on the target for such aggregates, but nothing ever constructs them properly, so they end up zero-initialised. (See the new test static-aggr-constructor-destructor-3.C for a reason why running constructors on the target is preferable to e.g. constructing on the host and then copying the resulting object to the target.) 2024-08-07 Julian Brown Tobias Burnus gcc/ChangeLog: * builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version. * gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New. (gimple_fold_builtin): Call it. * omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define. * tree.cc (get_file_function_name): Support names for on-target constructor/destructor functions. gcc/cp/ * decl2.cc (tree-inline.h): Include. (static_init_fini_fns): Bump to four entries. Update comment. (start_objects, start_partial_init_fini_fn): Add 'omp_target' parameter. Support "declare target" decls. Update forward declaration. (emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for the created function. Support "declare target". (OMP_SSDF_IDENTIFIER): New macro. (partition_vars_for_init_fini): Support partitioning "declare target" variables also. (generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support "declare target" decls. (c_parse_final_cleanups): Support constructors/destructors on OpenMP offload targets. gcc/fortran/ChangeLog: * f95-lang.cc (gfc_init_builtin_functions): Handle DEF_GOMP_BUILTIN_COMPILER) * trans-decl.cc (gfc_get_extern_function_decl): Add code to use DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'. libgomp/ChangeLog: * testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test. * testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test. * testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test. * testsuite/libgomp.c-c++-common/target-is-initial-host.c: New test. * testsuite/libgomp.fortran/target-is-initial-host.f: New test. * testsuite/libgomp.fortran/target-is-initial-host.f90: New test. Co-authored-by: Tobias Burnus gcc/builtins.def | 4 + gcc/cp/decl2.cc| 229 + gcc/fortran/f95-lang.cc| 9 + gcc/fortran/trans-decl.cc | 8 + gcc/gimple-fold.cc | 20 ++ gcc/omp-builtins.def | 4 + gcc/tree
[committed] libgomp.texi: Add OpenMP TR13 routines to @menu (commented out)
Not user visible but I use this to keep track of both implemented OpenMP runtime routines that still have to be documented and of still to be implemented (and then documented) routines. This commit (r15-2713-g1a5734135d265a) adds those routines added in OpenMP's third 6.0 preview (Technical Report 13). Tobias PS: The routines are again reordered in OpenMP; the question is whether we want to follow suit or keep the current ordering. I only reordered the undocumented ones inside @menu and only those @menu that I modified. commit 1a5734135d265a7b363ead9f821676a2a358969b Author: Tobias Burnus Date: Mon Aug 5 09:18:29 2024 +0200 libgomp.texi: Add OpenMP TR13 routines to @menu (commented out) To keep track of missing routine documentation (both implemented and not), the libgomp.texi file contains all non-OMPT routines as commented items in @menu. This commit adds the routines added in TR13 as commented fixme items. libgomp/ChangeLog: * libgomp.texi (OpenMP Runtime Library Routines): Add TR13 routines to @menu (commented out). --- libgomp/libgomp.texi | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 07cd75124b0..c6759dd03bc 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -1591,12 +1591,18 @@ They have C linkage and do not throw exceptions. @menu * omp_get_num_procs:: Number of processors online @c * omp_get_max_progress_width:: /TR11 +@c * omp_get_device_from_uid:: /TR13 +@c * omp_get_uid_from_device:: /TR13 * omp_set_default_device:: Set the default device for target regions * omp_get_default_device:: Get the default device for target regions * omp_get_num_devices:: Number of target devices * omp_get_device_num:: Get device that current thread is running on * omp_is_initial_device:: Whether executing on the host device * omp_get_initial_device:: Device number of host device +@c * omp_get_device_num_teams:: /TR13 +@c * omp_set_device_num_teams:: /TR13 +@c * omp_get_device_teams_thread_limit:: /TR13 +@c * omp_set_device_teams_thread_limit:: /TR13 @end menu @@ -2813,8 +2819,27 @@ Routines to manage and allocate memory on the current device. They have C linkage and do not throw exceptions. @menu +@c * omp_get_devices_memspace:: /TR13 +@c * omp_get_device_memspace:: /TR13 +@c * omp_get_devices_and_host_memspace:: /TR13 +@c * omp_get_device_and_host_memspace:: /TR13 +@c * omp_get_devices_all_memspace:: /TR13 +@c * omp_get_memspace_num_resources:: /TR11 +@c * omp_get_memspace_pagesize:: /TR13 +@c * omp_get_submemspace:: /TR11 +@c * omp_init_mempartitioner:: /TR13 +@c * omp_destroy_mempartitioner:: /TR13 +@c * omp_init_mempartition:: /TR13 +@c * omp_destroy_mempartition:: /TR13 +@c * omp_mempartition_set_part:: /TR13 +@c * omp_mempartition_get_user_data:: /TR13 * omp_init_allocator:: Create an allocator * omp_destroy_allocator:: Destroy an allocator +@c * omp_get_devices_allocator:: /TR13 +@c * omp_get_device_allocator:: /TR13 +@c * omp_get_devices_and_host_allocator:: /TR13 +@c * omp_get_device_and_host_allocator:: /TR13 +@c * omp_get_devices_all_allocator:: /TR13 * omp_set_default_allocator:: Set the default allocator * omp_get_default_allocator:: Get the default allocator * omp_alloc:: Memory allocation with an allocator @@ -2823,8 +2848,6 @@ They have C linkage and do not throw exceptions. * omp_calloc:: Allocate nullified memory with an allocator * omp_aligned_calloc:: Allocate nullified aligned memory with an allocator * omp_realloc:: Reallocate memory allocated with OpenMP routines -@c * omp_get_memspace_num_resources:: /TR11 -@c * omp_get_submemspace:: /TR11 @end menu
Re: [PATCH] fortran: Fix a pasto in gfc_check_dependency
[static analyzer] Jakub Jelinek wrote: […] it is some proprietary static analyzer I want to point out that a under utilized static analyzer keeps scanning GCC: Coverity Scan. If someone has the time, I think it would be worthwhile to have a look at the reports. There are a bunch of persons having access to it – and more can be added (I think I can grant access). Thus, is someone of the GCC developers has interest + time … Tobias
[wwwdocs] OpenMP: gcc-15/changes.html - minor update / projects/gomp - link to TR13
First, OpenMP TR13 has just been released. Hence, link it from our project page. For the GCC 15 page, I suggest to add ompx_gnu_pinned_mem_alloc (but one can argue about that) and the nvptx I/O support. We could also talk about nvptx + constructor support here. Comments, thoughts, remarks before I commit it? Current pages are: https://gcc.gnu.org/gcc-15/changes.html + https://gcc.gnu.org/projects/gomp/#omp-status Tobias OpenMP: gcc-15/changes.html - minor update / projects/gomp - link to TR13 * htdocs/gcc-15/changes.html (OpenMP): Mention ompx_gnu_pinned_mem_alloc and Fortran I/O support on nvptx with OpenMP offloading. * htdocs/projects/gomp/index.html (OpenMP Releases and Status): Add TR13. diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index a1bb0ddf..2fd7aa90 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -50,6 +50,12 @@ a work-in-progress. see the offload-target specifics section in the https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html"; >GNU Offloading and Multi Processing Runtime Library Manual. + GCC added ompx_gnu_pinned_mem_alloc as https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html";>predefined + allocator. On https://gcc.gnu.org/onlinedocs/libgomp/nvptx.html";>Nvidia + GPUs, writing to the terminal from OpenMP target regions (but not from + OpenACC compute regions) is now also supported in Fortran; in C/C++ and + on AMD GPUs this was already supported before with both OpenMP and OpenACC. OpenMP 5.1: The unroll and tile diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index d1765fc3..89f0b120 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -1326,7 +1326,11 @@ error. OpenMP Releases and Status -November 9, 2023 +August 1, 2023 +https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf";>OpenMP +Technical Report 13 (third preview for the OpenMP API Version 6.0) has been +released. + https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf";>OpenMP Technical Report 12 (second preview for the OpenMP API Version 6.0) has been released.
Re: [Patch] libgomp: Device load_image - minor num-funcs/vars check improvement
I have sent the following page in February (Stage 4) and didn't want to commit it back then. But for Stage 1, it should be fine ... I like to commit it tomorrow, unless there are comments suggesting other. Attached is the unchanged patch and I also added a "diff -w -U1" patch as that makes it easier to see the non-re-indent changes. Tobias On February 19, 2024, Tobias Burnus wrote: When debugging a linker issue, leading to a mismatch in the number of host/device functions, I was surprised by seeing one additional entry. Well, it turned out to be due to the ICV variable. This patch makes it more consistent. The "+1" is returned since r12-2769-g0bac793ed6bad2 (for the on-device omp_get_device_num), extended in r13-2545-g9f2fca56593a2b for a struct to support more ICV variables on the devices [to handle OMP_..._DEV environment variables]. As the value is returned unconditionally, it makes sense to use it both for the expected-value diagnostic and for the condition further below. Comments, suggestions, remarks? Tobias PS: Alternative would be to make the plugin's value depend on whether the data was loaded. But that would make the number-of-entries assert weaker and might cause corner-case issues when a slightly older libgomp plugin is used with the updated libgomp.so. Thus, I have settled for the attached variant.diff --git a/libgomp/target.c b/libgomp/target.c index efed6ad68ff..fb9a6fb5c79 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2364,5 +2364,4 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, - if (num_target_entries != num_funcs + num_vars - /* "+1" due to the additional ICV struct. */ - && num_target_entries != num_funcs + num_vars + 1) + /* The "+1" is due to the additional ICV struct. */ + if (num_target_entries != num_funcs + num_vars + 1) { @@ -2372,3 +2371,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, gomp_fatal ("Cannot map target functions or variables" - " (expected %u, have %u)", num_funcs + num_vars, + " (expected %u + %u + 1, have %u)", num_funcs, num_vars, num_target_entries); @@ -2456,11 +2455,5 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, - /* Last entry is for a ICVs variable. - Tolerate case where plugin does not return those entries. */ - if (num_funcs + num_vars < num_target_entries) -{ - struct addr_pair *var = &target_table[num_funcs + num_vars]; - - /* Start address will be non-zero for the ICVs variable if - the variable was found in this image. */ - if (var->start != 0) + /* Last entry is for the ICV struct variable; if absent, start = end = 0. */ + struct addr_pair *icv_var = &target_table[num_funcs + num_vars]; + if (icv_var->start != 0) { @@ -2471,3 +2464,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num); - size_t var_size = var->end - var->start; + size_t var_size = icv_var->end - icv_var->start; if (var_size != sizeof (struct gomp_offload_icvs)) @@ -2482,3 +2475,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, actually designating its device number into effect. */ - gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs, + gomp_copy_host2dev (devicep, NULL, (void *) icv_var->start, icvs, var_size, false, NULL); @@ -2489,3 +2482,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, k->tgt = tgt; - k->tgt_offset = var->start; + k->tgt_offset = icv_var->start; k->refcount = REFCOUNT_INFINITY; @@ -2498,3 +2491,2 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, } -} libgomp: Device load_image - improve minor num-funcs/vars check The run time library loads the offload functions and variable and optionally the ICV variable and returns the number of loaded items, which has to match the host side. The plugin returns "+1" (since GCC 12) for the ICV variable entry, independently whether it was loaded or not, but the var's value (start == end == 0) can be used to detect when this failed. Thus, we can tighten the assert check - which this commit does together with making the output less surprising - and simplify the condition further below. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): If ICV variable is is not available, decrement other_count and thus the return value. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_load_image_to_device): Extend fatal-error message; simplify a condition. libgomp/target.c | 78 +
[Patch, v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])
Hi Richard, hi all, Richard Biener wrote: Looking at pass_omp_target_link::execute I wonder iff find_link_var_op shouldn't simply do the substitution? Aka This seems to work ... --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *) && is_global_var (t) && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t))) { + *tp = unshare_expr (DECL_VALUE_EXPR (t)); *walk_subtrees = 0; return t; } which then makes the stmt obviously not gimple? ... except that 'return t' prevents updating other value-expr in the same stmt, but that can be fixed. Updated patch attached. Thanks for the suggestion! Tobias omp-offload.cc: Fix value-expr handling of 'declare target link' vars As the PR and included testcase shows, replacing 'arr2' by its value expression '*arr2$13$linkptr' failed for MEM [(c_char * {ref-all})&arr2] which left 'arr2' in the code as unknown symbol. Now expand the value expression already in pass_omp_target_link::execute's process_link_var_op walk_gimple_stmt walk - and don't rely on gimple_regimplify_operands. PR middle-end/115637 gcc/ChangeLog: * gimplify.cc (gimplify_body): Fix macro name in the comment. * omp-offload.cc (found_link_var): New global var. (find_link_var_op): Rename to ... (process_link_var_op): ... this. Replace value expr; set found_link_var. (pass_omp_target_link::execute): Update walk_gimple_stmt call. libgomp/ChangeLog: * testsuite/libgomp.fortran/declare-target-link.f90: Uncomment now working code. Co-authored-by: Richard Biener PR115637 -! if (res /= -11436) stop 5 -if (res /= -11546) stop 5 ! FIXME +! print *, res +if (res /= -11436) stop 5 end integer function run_device1() !$omp declare target integer :: i run_device1 = -99 -! FIXME: arr2 not link mapped -> PR115637 -! arr2 = [11,22,33,44] +arr2 = [11,22,33,44] if (any (arr(10:50) /= [(i, i=10,50)])) then run_device1 = arr(11) return end if -! FIXME: -> PR115637 -! run_device1 = sum(arr(10:13) + arr2) -run_device1 = sum(arr(10:13) ) ! FIXME +run_device1 = sum(arr(10:13) + arr2) do i = 10, 50 arr(i) = 3 - 10 * arr(i) end do
[PATCH, v2] OpenMP: Constructors and destructors for "declare target" static aggregates
Hello world, hi Jakub, I would like to PING the following patch. It's essentially Julian's patch, except: * It is rediffed (albeit it mostly applied cleanly). * I replaced the omp_is_initial_device call by an internal function (IFN_) such that it can be evaluated at compile time. With -O1, this also optimizes the host function away as it should :-) * Regarding nvptx: constructors are supported since GCC 15. Thus, the three testcases now work under nvptx as well. (Two fail on nvptx when compiled with neither optimization nor -foffload-options=nvptx-none=-malias as the constructor uses aliases, which aren't supported, yet.) Comments, remarks, suggestions? OK for mainline? Tobias On May 12, 2023, Julian Brown wrote:> This patch adds support for running constructors and destructors for static (file-scope) aggregates for C++ objects which are marked with "declare target" directives on OpenMP offload targets. At present, space is allocated on the target for such aggregates, but nothing ever constructs them properly, so they end up zero-initialised. The approach taken is to generate a set of constructors to run on the target: this currently works for AMD GCN, but fails on NVPTX due to lack of constructor/destructor support there so far on mainline. (See the new test static-aggr-constructor-destructor-3.C for a reason why running constructors on the target is preferable to e.g. constructing on the host and then copying the resulting object to the target.) This patch was previously posted for the og12 branch here: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614710.html https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615013.html https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615144.html though needed a fair amount of rework for mainline due to Nathan's (earlier!) patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596402.html Tested with offloading to AMD GCN and bootstrapped. OK for mainline? Thanks, Julian OpenMP: Constructors and destructors for "declare target" static aggregates This patch adds support for running constructors and destructors for static (file-scope) aggregates for C++ objects which are marked with "declare target" directives on OpenMP offload targets. At present, space is allocated on the target for such aggregates, but nothing ever constructs them properly, so they end up zero-initialised. (See the new test static-aggr-constructor-destructor-3.C for a reason why running constructors on the target is preferable to e.g. constructing on the host and then copying the resulting object to the target.) 2024-07-30 Julian Brown Tobias Burnus gcc/cp/ * decl2.cc (tree-inline.h): Include. (static_init_fini_fns): Bump to four entries. Update comment. (start_objects, start_partial_init_fini_fn): Add 'omp_target' parameter. Support "declare target" decls. Update forward declaration. (emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for the created function. Support "declare target". (OMP_SSDF_IDENTIFIER): New macro. (partition_vars_for_init_fini): Support partitioning "declare target" variables also. (generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support "declare target" decls. (c_parse_final_cleanups): Support constructors/destructors on OpenMP offload targets. gcc/ * gimplify.cc (gimplify_call_expr): Set calls_declare_variant_alt for IFN_GOMP_IS_INITIAL_DEVICE. * internal-fn.cc (expand_GOMP_IS_INITIAL_DEVICE): New. * internal-fn.def (IFN_GOMP_IS_INITIAL_DEVICE): Add. * omp-offload.cc (execute_omp_device_lower): Expand it. * tree.cc (get_file_function_name): Support names for on-target constructor/destructor functions. libgomp/ * testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test. * testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test. * testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test. Co-authored-by: Tobias Burnus gcc/cp/decl2.cc| 240 + gcc/gimplify.cc| 8 +- gcc/internal-fn.cc | 8 + gcc/internal-fn.def| 1 + gcc/omp-offload.cc | 7 + gcc/tree.cc| 6 +- .../static-aggr-constructor-destructor-1.C | 28 +++ .../static-aggr-constructor-destructor-2.C | 31 +++ .../static-aggr-constructor-destructor-3.C | 36 9 files changed, 324 insertions(+), 41 deletions(-) diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index 6d674684931..21ac65452e6 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3. If not see #include "asan.h" #include "optabs-query.h&
Re: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]
Richard Biener wrote: On Mon, Jul 29, 2024 at 9:26 PM Tobias Burnus wrote: Inside pass_omp_target_link::execute, there is a call to gimple_regimplify_operands but the value expression is not expanded.[...] Where is_gimple_mem_ref_addr is defined as: /* Return true if T is a valid address operand of a MEM_REF. */ bool is_gimple_mem_ref_addr (tree t) { return (is_gimple_reg (t) || TREE_CODE (t) == INTEGER_CST || (TREE_CODE (t) == ADDR_EXPR && (CONSTANT_CLASS_P (TREE_OPERAND (t, 0)) || decl_address_invariant_p (TREE_OPERAND (t, 0); } I think iff then decl_address_invariant_p should be amended. This does not work - at least not for my use case if OpenMP link variables - due to ordering issues. For the device compilers, the VALUE_EXPR is added in lto_main or in do_whole_program_analysis (same file: lto/lto.cc) by callingoffload_handle_link_vars. The value expression is then later expanded via pass_omp_target_link::execute, but in between the following happens: lto_main callssymbol_table::compile, which then calls cgraph_node::expand and that executes res |= verify_types_in_gimple_reference (lhs, true); for lhs being: MEM [(c_char * {ref-all})&arr2] But when adding the has-value-expr check either directly to is_gimple_mem_ref_addr or to the decl_address_invariant_pit calls, the following condition becomes true the called function in tree-cfg.cc: 3302 if (!is_gimple_mem_ref_addr (TREE_OPERAND (expr, 0)) 3303 || (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR 3304 && verify_address (TREE_OPERAND (expr, 0), false))) 3305{ 3306 error ("invalid address operand in %qs", code_name); * * * Thus, I am now back to the previous change, except for: Why is the gimplify_addr_expr hunk needed? It should get to gimplifying the VAR_DECL/PARM_DECL by recursion? Indeed. I wonder why I had (thought to) need it before; possibly because it was needed or thought to be needed when trying to trace this down. Previous patch - except for that bit removed - attached. Thoughts, better ideas? Tobias gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637] As the PR and included testcase shows, replacing 'arr2' by its value expression '*arr2$13$linkptr' failed for MEM [(c_char * {ref-all})&arr2] which left 'arr2' in the code as unknown symbol. PR middle-end/115637 gcc/ChangeLog: * gimplify.cc (gimplify_expr): For MEM_REF and an ADDR_EXPR, also check for value-expr arguments. (gimplify_body): Fix macro name in the comment. libgomp/ChangeLog: * testsuite/libgomp.fortran/declare-target-link.f90: Uncomment now working code. gcc/gimplify.cc | 9 +++-- libgomp/testsuite/libgomp.fortran/declare-target-link.f90 | 15 ++- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index ab323d764e8..4fa88c9b21c 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -18251,8 +18251,13 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, in suitable form. Re-gimplifying would mark the address operand addressable. Always gimplify when not in SSA form as we still may have to gimplify decls with value-exprs. */ + tmp = TREE_OPERAND (*expr_p, 0); if (!gimplify_ctxp || !gimple_in_ssa_p (cfun) - || !is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) + || (!is_gimple_mem_ref_addr (tmp) + || (TREE_CODE (tmp) == ADDR_EXPR + && (VAR_P (TREE_OPERAND (tmp, 0)) + || TREE_CODE (TREE_OPERAND (tmp, 0)) == PARM_DECL) + && DECL_HAS_VALUE_EXPR_P (TREE_OPERAND (tmp, 0) { ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, is_gimple_mem_ref_addr, fb_rvalue); @@ -19422,7 +19427,7 @@ gimplify_body (tree fndecl, bool do_parms) DECL_SAVED_TREE (fndecl) = NULL_TREE; /* If we had callee-copies statements, insert them at the beginning - of the function and clear DECL_VALUE_EXPR_P on the parameters. */ + of the function and clear DECL_HAS_VALUE_EXPR_P on the parameters. */ if (!gimple_seq_empty_p (parm_stmts)) { tree parm; diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 index 2ce212d114f..44c67f925bd 100644 --- a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 +++ b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 @@ -1,5 +1,7 @@ ! { dg-additional-options "-Wall" } + ! PR fortran/115559 +! PR middle-end/115637 module m integer :: A @@ -73,24 +75,19 @@ contains !$omp target map(from:res) res = run_device1() !$omp end target -print *, res -! FIXME: arr2 not link mapped -> PR115637 -! if (res /= -1
Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
Prathamesh Kulkarni wrote: Thanks for your suggestions on RFC email, the attached patch adds support for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS. First, thanks a lot for your patch! Secondly, it seems as if this patch is indented to fully or partially fix the following PRs. If so, can you add the PR to the commit log such that both "git log" will help finding the problem report and the commit will show up in the issue? https://gcc.gnu.org/PR111937 PR ipa/111937 offloading from x86_64-linux-gnu to riscv*-linux-gnu will have issues https://gcc.gnu.org/PR96265 PR ipa/96265 offloading to nvptx-none from aarch64-linux-gnu (and riscv*-linux-gnu) does not work And - marked as duplicate of the latter: https://gcc.gnu.org/PR114174 PR lto/114174 [aarch64] Offloading to nvptx-none Thanks, Tobias
[committed] gfortran.dg/compiler-directive_2.f: Update dg-error (was: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR115559])
Follow up fix: As the !GCC$ attributes are now added in reverse order, the 'stdcall' vs. 'fastcall' in the error message swapped order: "Error: stdcall and fastcall attributes are not compatible" This didn't show up here with -m64 ("Warning: 'stdcall' attribute ignored") and I didn't run it with -m32, but it was reported by Haochen's script + manually confirmed by him. (Thanks for the report and checking – and sorry for the FAIL.) Committed asr15-2401-g15158a8853a69f. Tobias
[Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637]
The problem is code like: MEM [(c_char * {ref-all})&arr2] where arr2 is the value expr '*arr2$13$linkptr' (i.e. indirect ref + decl name). Insidepass_omp_target_link::execute, there is a call to gimple_regimplify_operands but the value expression is not expanded.There are two problems: ADDR_EXPR is no handling this and while MEM_REF has some code for it, it doesn't handle this either. The attached code fixes this. Tested on x86_64-gnu-linux with nvidia offloading. Comments, remarks, OK? Better suggestions? * * * In gimplify_expr for MEM_REF, there is a call to is_gimple_mem_ref_addr which checks for ADD_EXPR but not for value expressions. The attached match handles the case explicitly, but, alternatively, we might want move it to is_gimple_mem_ref_addr (not checked whether it makes sense or not). Where is_gimple_mem_ref_addr is defined as: /* Return true if T is a valid address operand of a MEM_REF. */ bool is_gimple_mem_ref_addr (tree t) { return (is_gimple_reg (t) || TREE_CODE (t) == INTEGER_CST || (TREE_CODE (t) == ADDR_EXPR && (CONSTANT_CLASS_P (TREE_OPERAND (t, 0)) || decl_address_invariant_p (TREE_OPERAND (t, 0); } Tobias gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637] As the PR and included testcase shows, replacing 'arr2' by its value expression '*arr2$13$linkptr' failed for MEM [(c_char * {ref-all})&arr2] which left 'arr2' in the code as unknown symbol. PR middle-end/115637 gcc/ChangeLog: * gimplify.cc (gimplify_addr_expr): Handle value-expr arg. (gimplify_expr): For MEM_REF and an ADDR_EXPR, also check for value-expr arguments. (gimplify_body): Fix macro name in the comment. libgomp/ChangeLog: * testsuite/libgomp.fortran/declare-target-link.f90: Uncomment now working code. gcc/gimplify.cc | 16 ++-- .../testsuite/libgomp.fortran/declare-target-link.f90| 15 ++- 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index ab323d764e8..d548dc2cdf6 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -6888,6 +6888,13 @@ gimplify_addr_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p) enum gimplify_status ret; location_t loc = EXPR_LOCATION (*expr_p); + if (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL) +{ + ret = gimplify_var_or_parm_decl (&TREE_OPERAND (expr, 0)); + if (ret == GS_ERROR) + return ret; + op0 = TREE_OPERAND (expr, 0); +} switch (TREE_CODE (op0)) { case INDIRECT_REF: @@ -18251,8 +18258,13 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, in suitable form. Re-gimplifying would mark the address operand addressable. Always gimplify when not in SSA form as we still may have to gimplify decls with value-exprs. */ + tmp = TREE_OPERAND (*expr_p, 0); if (!gimplify_ctxp || !gimple_in_ssa_p (cfun) - || !is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) + || (!is_gimple_mem_ref_addr (tmp) + || (TREE_CODE (tmp) == ADDR_EXPR + && (VAR_P (TREE_OPERAND (tmp, 0)) + || TREE_CODE (TREE_OPERAND (tmp, 0)) == PARM_DECL) + && DECL_HAS_VALUE_EXPR_P (TREE_OPERAND (tmp, 0) { ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, is_gimple_mem_ref_addr, fb_rvalue); @@ -19422,7 +19434,7 @@ gimplify_body (tree fndecl, bool do_parms) DECL_SAVED_TREE (fndecl) = NULL_TREE; /* If we had callee-copies statements, insert them at the beginning - of the function and clear DECL_VALUE_EXPR_P on the parameters. */ + of the function and clear DECL_HAS_VALUE_EXPR_P on the parameters. */ if (!gimple_seq_empty_p (parm_stmts)) { tree parm; diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 index 2ce212d114f..44c67f925bd 100644 --- a/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 +++ b/libgomp/testsuite/libgomp.fortran/declare-target-link.f90 @@ -1,5 +1,7 @@ ! { dg-additional-options "-Wall" } + ! PR fortran/115559 +! PR middle-end/115637 module m integer :: A @@ -73,24 +75,19 @@ contains !$omp target map(from:res) res = run_device1() !$omp end target -print *, res -! FIXME: arr2 not link mapped -> PR115637 -! if (res /= -11436) stop 5 -if (res /= -11546) stop 5 ! FIXME +! print *, res +if (res /= -11436) stop 5 end integer function run_device1() !$omp declare target integer :: i run_device1 = -99 -! FIXME: arr2 not link mapped -> PR115637 -! arr2 = [11,22,33,44] +arr2 = [11,22,33,44] if (any (arr(10:50) /= [(i, i=10,50)])) then run_device1 = arr(11) return end if -! FIXME: -> PR115637 -! run_device1 = sum(arr(10:13) + arr2) -run_device1 = sum(arr(10:13) ) ! FIXME +run_device1 = sum(arr
[Patch] libgomp.texi: Update 'Device Information Routines' section
I recently stumbled over omp_get_default_device returning -1 (= omp_initial_device) vs. returning omp_get_num_devices(). Thus, it makes sense to document this properly. I also updated some wording and made a tiny step to documenting the missing functions by adding a title to the commented @menu items. → https://gcc.gnu.org/onlinedocs/libgomp/#toc-OpenMP-Runtime-Library-Routines for the current wording. Comments or suggestions before I commit it? Tobias libgomp.texi: Update 'Device Information Routines' section Update 'OpenMP Runtime Library Routines' by adding a note that invoking inside a target region might invoke unspecified behavior. Additionally, update omp_{get,set}_default_device for omp_{initial,invalid}_device named constants. libgomp/ChangeLog: * libgomp.texi (OpenMP Runtime Library Routines): Add missing title to some commented still undocumented items. (Device Information Routines): Update. libgomp/libgomp.texi | 48 +--- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 50da248b74d..8fe74d58562 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -1208,11 +1208,11 @@ They have C linkage and do not throw exceptions. @menu * omp_get_proc_bind:: Whether threads may be moved between CPUs -@c * omp_get_num_places:: -@c * omp_get_place_num_procs:: -@c * omp_get_place_proc_ids:: -@c * omp_get_place_num:: -@c * omp_get_partition_num_places:: +@c * omp_get_num_places:: Get the number of places available +@c * omp_get_place_num_procs:: Get the number of processes associated with a place +@c * omp_get_place_proc_ids:: Get number of processes associated with a place +@c * omp_get_place_num::Get place number of the associated task +@c * omp_get_partition_num_places:: Get number of places of innermost task @c * omp_get_partition_place_nums:: @c * omp_set_affinity_format:: @c * omp_get_affinity_format:: @@ -1627,8 +1627,12 @@ Returns the number of processors online on that device. @subsection @code{omp_set_default_device} -- Set the default device for target regions @table @asis @item @emph{Description}: -Set the default device for target regions without device clause. The argument -shall be a nonnegative device number. +Get the value of the @emph{default-device-var} ICV, which is used +for target regions without device clause. The argument +shall be a nonnegative device number, @code{omp_initial_device}, +or @code{omp_invalid_device}. + +The effect of running this routine in a @code{target} region is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -1654,7 +1658,15 @@ shall be a nonnegative device number. @subsection @code{omp_get_default_device} -- Get the default device for target regions @table @asis @item @emph{Description}: -Get the default device for target regions without device clause. +Get the value of the @emph{default-device-var} ICV, which is used +for target regions without device clause. The value is either a +nonnegative device number, @code{omp_initial_device} or +@code{omp_invalid_device}. Note that for the host, the ICV can have two values +and, hence, this routine might return either the value of the named constant +@code{omp_initial_device} or the value returned by the +@code{omp_get_initial_device} routine. + +The effect of running this routine in a @code{target} region is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -1667,7 +1679,8 @@ Get the default device for target regions without device clause. @end multitable @item @emph{See also}: -@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device} +@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}, +@ref{omp_get_initial_device} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30. @@ -1681,6 +1694,8 @@ Get the default device for target regions without device clause. @item @emph{Description}: Returns the number of target devices. +The effect of running this routine in a @code{target} region is unspecified. + @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);} @@ -1702,9 +1717,9 @@ Returns the number of target devices. @table @asis @item @emph{Description}: This function returns a device number that represents the device that the -current thread is executing on. For OpenMP 5.0, this must be equal to the -value returned by the @code{omp_get_initial_device} function when called -from the host. +current thread is executing on. When called on the host, it returns +the same value as returned by the @code{omp_get_initial_device} function +as required since OpenMP 5.0. @item @emph{C/C++} @multitable @columnfractions .20 .80 @@ -1754,9 +1769,11 @@ their language-specific counterparts. @table @asis @item @emph{Description}: This function returns a device number that rep
Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]
Hi Andre, hi all, Andre Vehreschild wrote: yes, I could have looked harder 🙂 I wrote ;-) on purpose as this feature is somewhat hidden and writing 'dg-do compile' doesn't harm. In case of gcc/testsuite, the 'run' is also needed and were often missed (or rather caused by invalid variants such as 'dg-run' (should be: 'dg-do run') or '{dg-do run }' (missing space after '{') prevented the running of the code). Sam did fix some of those (and some other dg-* issues) recently, e.g. in r15-2349-ga75c6295252d0d (→ https://gcc.gnu.org/r15-2349-ga75c6295252d0d ). This isn't by any chance documented on the developer website of gcc somewhere? It would be sad, if that knowledge is not publicy available for the future. https://gcc.gnu.org/onlinedocs/gccint/Directives.html#Specify-how-to-build-the-test documents it. And libgomp has: lib/libgomp.exp:set dg-do-what-default run The all arguments vs. only -O2 is set in libgomp via: libgomp.c++/c++.exp: set DEFAULT_CFLAGS "-O2" libgomp.c/c.exp: set DEFAULT_CFLAGS "-O2" and for libgomp.*fortran/fortran.exp, the difference between 'dg-do run' vs. default is *not* *documented,* but seems to be the result of the following: # For Fortran we're doing torture testing, as Fortran has far more tests # with arrays etc. that testing just -O0 or -O2 is insufficient, that is # typically not the case for C/C++. gfortran-dg-runtest $tests "" "" Tobias
Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]
Hi Andre, Andre Vehreschild wrote: I am wondering why the testcase has no `!{ dg-do ... }` line. What will dejagnu do then? Sorry for the may be stupid question, but I never encountered a testcase without a dg-do line. It was the minimum for me. Well, then you need look harder ;-) In gcc/testsuite/, the default is '{ dg-do compile }', i.e. you can specify or leave out that line without any additional effect. Having it might be a tad clearer, albeit makes the test a tad longer. But if you want to 'run' or 'link', you need to specify the dg-do line. There are several files which don't have the "dg-do compile" line, also under gcc/testsuite/gfortran.dg In case of libgomp, it is becomes interesting: the default is running the code, i.e. you need a 'compile' or 'link' when it shouldn't be run. However, at least for Fortran (libgomp.{oacc-}fortran), there is a difference between specifying nothing and specifying 'dg-do run': In case of the default, it is compiled and run. But if you specify 'dg-do run', it is compiled multiple times with different optimization options and then run. (Actually, also under gcc/testsuite/gfortran.dg, you get multiple compilations + runs with 'dg-do run'. If you use dg-additional-options, you can also add options. I think with dg-options, you set it to a single run [not confirmed].) The downside of compiling + running it multiple times is a longer test time without any real benefit. However, especially with Fortran, compiling with different optimization levels did expose issues in the past, both in the Fortran front end and in the middle end. — Thus, there some benefit of using it. In any case, there more complex the code is that front-end + middle-end code have to process, the more useful is "dg-do run". The more work is done by the run-time library, be it libgfortran or libgomp, the less useful it becomes as the heavy lifting is done in the run-time library. — As libgomp progressing already takes quite some time (albeit it can now run in parallel), there are some who prefer few 'dg-do run' and others who prefer if all Fortran testcases there use 'dg-do run' … I hope it helps, Tobias
[Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]
Updated patch - only change is to the testcase: * With the just posted patch for PR116107, array sections with offset work for 'link', hence, I updated the testcase. * For 'arr2', I added ref to the associated PR. I intent to commit it once PR116107 has been committed. Tobias Tobias Burnus wrote: Hi all, it turned out that 'declare target' with 'link' clause was broken in multiple ways. The main fix is the attached patch, i.e. namely pushing the variables already to the offload-vars list already in the FE. When implementing it, I noticed: * C has a similar issue when using nested functions, which is a GNU extension →https://gcc.gnu.org/115574 * When doing partial mapping of arrays (which is one of the reasons for 'link'), offsets are mishandled in Fortran (not tested in C), see FIXME in the patch) There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19. (I will file a PR about this). * It might happen that linked variables do not get linked. I have not investigated why, but 'arr2' gives link errors – while 'arr' works. See FIXME in the patch. (I will file a PR about this) * For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577 * When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c', it fails to link the device side as the 'mycom_' symbol cannot be found on the device side. (I will file a PR about this) As COMMON as issues, an alternative would be to defer the trans-common.cc changes to a later patch. Comments, questions, concerns? Tobias PS: Tested with nvptx offloading with a page-migration supporting system with nvptx and GCN offloading configured and no new fails observed.OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555] Contrary to a normal 'declare target', the 'declare target link' attribute also needs to set node->offloadable and push the offload_vars in the front end. Linked variables require that the data is mapped. For module variables, this can happen anywhere. For variables in an external subprograms or the main programm, this can only happen in the either that program itself or in an internal subprogram. - Whether a variable is just normally mapped or linked then becomes relevant if a device routine exists that can access that variable, i.e. an internal procedure has then to be marked as declare target. PR fortran/115559 gcc/fortran/ChangeLog: * trans-common.cc (build_common_decl): Add 'omp declare target' and 'omp declare target link' variables to offload_vars. * trans-decl.cc (add_attributes_to_decl): Likewise; update args and call decl_attributes. (get_proc_pointer_decl, gfc_get_extern_function_decl, build_function_decl): Update calls. (gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1' to avoid errors with symtab_node::get_create. libgomp/ChangeLog: * testsuite/libgomp.fortran/declare-target-link.f90: New test. gcc/fortran/trans-common.cc| 21 gcc/fortran/trans-decl.cc | 81 +- .../libgomp.fortran/declare-target-link.f90| 116 + 3 files changed, 192 insertions(+), 26 deletions(-) diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc index 5f44e7bd663..e714342c3c0 100644 --- a/gcc/fortran/trans-common.cc +++ b/gcc/fortran/trans-common.cc @@ -98,6 +98,9 @@ along with GCC; see the file COPYING3. If not see #include "coretypes.h" #include "tm.h" #include "tree.h" +#include "cgraph.h" +#include "context.h" +#include "omp-offload.h" #include "gfortran.h" #include "trans.h" #include "stringpool.h" @@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init) = tree_cons (get_identifier ("omp declare target"), omp_clauses, DECL_ATTRIBUTES (decl)); + if (com->omp_declare_target_link || com->omp_declare_target) + { + /* Add to offload_vars; get_create does so for omp_declare_target, + omp_declare_target_link requires manual work. */ + gcc_assert (symtab_node::get (decl) == 0); + symtab_node *node = symtab_node::get_create (decl); + if (node != NULL && com->omp_declare_target_link) + { + node->offloadable = 1; + if (ENABLE_OFFLOADING) + { + g->have_offload = true; + if (is_a (node)) + vec_safe_push (offload_vars, decl); + } + } + } + /* Place the back end declaration for this common block in GLOBAL_BINDING_LEVEL. */ gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl); diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 82fa2bb6134..0fdc41b1784 100644 --- a/gcc/fortran/trans-decl.cc +
[Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]
The main idea of 'link' is to permit putting only a subset of a huge array on the device. Well, in order to make this work properly, it requires that one can map an array section, which does not start with the first element. This patch adjusts the pointers such, that this actually works. (Tested on x86-64-gnu-linux with Nvptx offloading.) Comments, suggestions, remarks before I commit it? Tobias libgomp: Fix declare target link with offset array-section mapping [PR116107] Assume that 'int var[100]' is 'omp declare target link(var)'. When now mapping an array section with offset such as 'map(to:var[20:10])', the device-side link pointer has to store &[0] minus the offset such that var[20] will access [0]. But the offset calculation was missed such that the device-side 'var' pointed to the first element of the mapped data - and var[20] points beyond at some invalid memory. PR middle-end/116107 libgomp/ChangeLog: * target.c (gomp_map_vars_internal): Honor array mapping offsets with declare-target 'link' variables. * testsuite/libgomp.c-c++-common/target-link-2.c: New test. libgomp/target.c | 7 ++- .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 ++ 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/libgomp/target.c b/libgomp/target.c index aa01c1367b9..e3e648f5443 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, if (k->aux && k->aux->link_key) { /* Set link pointer on target to the device address of the - mapped object. */ - void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset); + mapped object. Also deal with offsets due to + array-section mapping. */ + void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset + - (k->host_start + - k->aux->link_key->host_start)); /* We intentionally do not use coalescing here, as it's not data allocated by the current call to this function. */ gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset, diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c new file mode 100644 index 000..4ff4080da76 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c @@ -0,0 +1,59 @@ +/* PR middle-end/116107 */ + +#include + +int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; +#pragma omp declare target link(arr) + +#pragma omp begin declare target +void f(int *res) +{ + __builtin_memcpy (res, &arr[5], sizeof(int)*10); +} + +void g(int *res) +{ + __builtin_memcpy (res, &arr[3], sizeof(int)*10); +} +#pragma omp end declare target + +int main() +{ + int res[10], res2; + for (int dev = 0; dev < omp_get_num_devices(); dev++) +{ + __builtin_memset (res, 0, sizeof (res)); + res2 = 99; + + #pragma omp target enter data map(arr[5:10]) device(dev) + + #pragma omp target map(from: res) device(dev) + f (res); + + #pragma omp target map(from: res2) device(dev) + res2 = arr[5]; + + if (res2 != 6) + __builtin_abort (); + for (int i = 0; i < 10; i++) + if (res[i] != 6 + i) + __builtin_abort (); + + #pragma omp target exit data map(release:arr[5:10]) device(dev) + + for (int i = 0; i < 15; i++) + res[i] *= 10; + __builtin_abort (); + + #pragma omp target enter data map(arr[3:10]) device(dev) + __builtin_memset (res, 0, sizeof (res)); + + #pragma omp target map(from: res) device(dev) + g (res); + + for (int i = 0; i < 10; i++) + if (res[i] != (4 + i)*10) + __builtin_abort (); +} + return 0; +}
Re: [PATCH v3 01/12] OpenMP: metadirective tree data structures and front-end interfaces
Hi Sandra, thanks for your patch. (Disclaimer: I have not finished reading through your patch.) Some upfront generic remarks: [* When first compiling it (incremental build), I did run into the issue that OMP_METADIRECTIVE_CHECK wasn't declared. Thus, there seems to be a dependency issue causing that tree-check.h might generated after code that includes tree.h is processed. (Unrelated to your patch itself, but for completeness …)] * Not required right now, but eventually we need to check whether https://gcc.gnu.org/PR112779 is fully fixed by this patch set or whether follow-up work is required (and if so which). There is also PR107067 for a Fortran ICE. * There are some not-implemented/FIXME comments in the patches for missing features. I think we should ensure that those won't get forgotten, e.g. by filing PRs for those. – For declare variant, some PRs might already exist. Can you eventually take care of the last two items? (For the last item: e.g. 'target_device' for declare_variant, for which 'sorry' already existed.) * * * I might have asked the following question before – and you might have answered it already: Sandra Loosemore wrote: This patch adds the OMP_METADIRECTIVE tree node and shared tree-level support for manipulating metadirectives. It defines/exposes interfaces that will be used in subsequent patches that add front-end and middle-end support, but nothing generates these nodes yet. I have to admit that I do not understand the part: + else if (set == OMP_TRAIT_SET_TARGET_DEVICE) +/* The target_device set is dynamic, so treat it as always + resolvable. */ +continue; + The current code has 3 states: * 0 - if a trait is false; this directly returns as it cannot be fixed later * 1 - if the all traits are known to match (initial value) * -1 - if one trait cannot be evaluated, either because it is too early (e.g. during parsing) or because it is a dynamic context selector. Thus, I had expected: (a) ret = -1 as default in this case (not known) (b) for cases where it is known, a 'return 0' / not-setting -1. In particular: * n == const → device_num(n) – false if '< -1' and, for '!ENABLE_OFFLOADING || offload_targets == NULL' either false for n > 0 or otherwise false. * Checks similar to OMP_TRAIT_DEVICE_{KIND,ARCH,ISA}, i.e. kind(any) → true, kind(fpga) → false, arch(something_unknown) → false if not true for any device. With '!ENABLE_OFFLOADING || offload_targets == NULL', the kind_arch_isa check can be done as for the host. * * * Have I missed something and is it sensible to return 1 instead of -1 here? * * * @@ -1804,6 +1834,12 @@ omp_context_selector_matches (tree ctx) case OMP_TRAIT_USER_CONDITION: if (set == OMP_TRAIT_SET_USER) for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p)) if (OMP_TP_NAME (p) == NULL_TREE) { + /* OpenMP 5.1 allows non-constant conditions for +metadirectives. */ + if (metadirective_p + && !tree_fits_shwi_p (OMP_TP_VALUE (p))) + break; + if (integer_zerop (OMP_TP_VALUE (p))) return 0; if (integer_nonzerop (OMP_TP_VALUE (p))) break; ret = -1; } * Comment wording: Please change to imply >= 5.1 not == 5.0 * Comment: I don't see why the non-const only applies to metadirectives; the OpenMP >= 5.1 seems to imply that it is also valid for declare variant. Thus, I would change the wording. * The current code seems to already handle non-const values as expected. ... except that it changes "res" to -1, while the idea seems to be not to modify 'ret' in this case for metadirectives. (Why? Same question as above). * * * Quotes from the specifications regarding the expressions: The current spec has: "Restrictions to context selectors are as follows:" … "A variable or procedure that is referenced in an expression that appears in a context selector must be visible at the location of the directive on which the context selector appears unless the directive is a declare_variant directive and the variable is an argument of the associated base function." 5.1 wording is the following (approx. same except for argument bit): "All variables that are referenced in an expression that appears in the context selector of a match clause must be accessible at a call site to the base function according to the base language rules." 5.0 had (e.g. for C): "The condition(boolean-expr) selector defines a constant expression that must evaluate to true for the selector to be true." * * * + if (metadirective_p + && !tree_fits_shwi_p (OMP_TP_VALUE (p))) + break; + if (integer_zerop (OMP_TP_VA
[Patch] install.texi (gcn): Suggest newer commit for Newlib
Hi Andrew, hi all, to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I suggest the attach patch that also suggest Thomas' Newlib commit (April 4, 2024) ed50a50b9 amdgcn: Implement proper locks: Fix 'newlib/libc/sys/amdgcn/include/sys/lock.h' for C++ and not only your commit (March 25, 2024) 7dd4eb1db amdgcn: Implement proper locks Comments or suggestions before I commit it? Tobias install.texi (gcn): Suggest newer commit for Newlib Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h compilable with C++. This commit mentiones now also the second commit. gcc/ChangeLog: * doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version for newlib. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index b5456992583..dda623f4410 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3952,9 +3952,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100} and @code{gfx1103}. -Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit -7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and -GFX11 devices). +Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits +7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output +for GFX10 and GFX11 devices). To run the binaries, install the HSA Runtime from the @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use
Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args
Hi PA, as discussed off list, I was stumbling over the call to GOMP_task. I now understand why: I was looking at a different version of the OpenMP spec. Namely, OpenMP 5.2 contains the changes for spec Issue 2741 "dispatch construct data scoping issues". Namely: Performance issue due to 'task' compared to direct call, effect of unintended firstprivatization, … The currrent version has (a) nowait "The addition of the *nowait* element to the semantic requirement set by the *dispatch* directive has no effect on the dispatch construct apart from the effect it may have on the arguments that are passed when calling a function variant." (I assume the latter is about 'append_args' of interop objects) (b) depend "If the *dispatch* directive adds one or more _depend_ element to the semantic requirement set, and those element are not removed by the effect of a declare variant directive, the behavior is as if those properties were applied as *depend* clauses to a *taskwait* construct that is executed before the *dispatch* region is executed." I think it would good to match the 5.2 behavior. * * * I have not fully checked whether the 'device' routine is properly handled. The current wording states: "If the device clause is present, the value of the default-device-var ICV is set to the value of the expression in the clause on entry to the dispatch region and is restored to its previous value at the end of the region." For the code itself, it seems to be handled correctly, see attached testcase (consider including). I was wondering (and haven't checked) whether the ICV is set for too much (i.e. not only the "data environment" (i.e. "The variables associated with the execution of a given region"), but is also imminently visible by other concurrently running threads outside of that region). Can you check. (Albeit, my question might also be answered once I finish reading the patch …) Thanks, Tobias #include int f () { return omp_get_default_device (); } int main () { for (int d = omp_initial_device; d <= omp_get_num_devices (); d++) { int dev = omp_invalid_device; omp_set_default_device (d); #pragma omp dispatch dev = f (); if (d == omp_initial_device || d == omp_get_num_devices ()) { if (dev != omp_initial_device && dev != omp_get_num_devices ()) __builtin_abort (); if (omp_get_default_device() != omp_initial_device && omp_get_default_device() != omp_get_num_devices ()) __builtin_abort (); } else if (dev != d || d != omp_get_default_device()) __builtin_abort (); for (int d2 = omp_initial_device; d2 <= omp_get_num_devices (); d2++) { dev = omp_invalid_device; #pragma omp dispatch device(d2) dev = f (); if (d == omp_initial_device || d == omp_get_num_devices ()) { if (omp_get_default_device() != omp_initial_device && omp_get_default_device() != omp_get_num_devices ()) __builtin_abort (); } else if (d != omp_get_default_device()) __builtin_abort (); if (d2 == omp_initial_device || d2 == omp_get_num_devices ()) { if (dev != omp_initial_device && dev != omp_get_num_devices ()) __builtin_abort (); } else if (dev != d2) __builtin_abort (); } } return 0; }
[Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file
Hi, Jakub Jelinek wrote: + "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n" If this was an attempt to deal gracefully with no #embed support, then the above would be wrong and should have been #if defined(__STDC_EMBED_FOUND__) && defined(__has_embed) #if __has_embed ("whatever") == __STDC_EMBED_FOUND__ I was kind of both – assuming that #embed is available (as it should be compiled by the accompanied compiler) but handle the case that it is not. However, as '#embed' is well diagnosed if unsupported, that part is not really needed. Now, if all you want is an error if the file doesn't exist, then #embed "whatever" will do that too […] If you want an error not just when it doesn't exist, but also when it is empty, then you could do #embed "whatever" if_empty (%%%) The idea was to also error out if the file is empty – as that shouldn't happen here: if offloading code was found, the code gen should be done. However, using an invalid expression seems to be a good idea as that's really a special case that shouldn't happen. * * * I have additionally replaced the #include by __UINTPTR_TYPE__ and __SIZE_TYPE__ to avoid including 3 header files; this doesn't have a large effect, but still. Updated patch attached. OK for mainline, once Jakub's #embed is committed? * * * BTW: Testing shows for a hello world program (w/o #embed patch) For -foffload=...: 'disable' 0.04s, 'nvptx-none' 0.15s, 'amdgcn-amdhsa' 1.2s. With a simple #embed (this patch plus Jakub's first patch), the performance is unchanged. I then applied Jakub's follow up patches, but I then get an ICE (Jakub will have a look). But compiling it with 'g++' (→ COLLECT_GCC is g++) works; result: takes 0.2s (~6× faster) and compiling for both nvptx and gcn takes 0.3s, nearly 5× faster. Tobias gcn/mkoffload.cc: Use #embed for including the generated ELF file gcc/ChangeLog: * config/gcn/mkoffload.cc (read_file): Remove. (process_asm): Do not add '#include' to generated C file. (process_obj): Generate C file that uses #embed and use __SIZE_TYPE__ and __UINTPTR_TYPE__ instead the #include-defined size_t and uintptr. (main): Update call to it; remove no longer needed file I/O. gcc/config/gcn/mkoffload.cc | 79 +++-- 1 file changed, 12 insertions(+), 67 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 810298a799b..c3c998639ff 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -182,44 +182,6 @@ xputenv (const char *string) putenv (CONST_CAST (char *, string)); } -/* Read the whole input file. It will be NUL terminated (but - remember, there could be a NUL in the file itself. */ - -static const char * -read_file (FILE *stream, size_t *plen) -{ - size_t alloc = 16384; - size_t base = 0; - char *buffer; - - if (!fseek (stream, 0, SEEK_END)) -{ - /* Get the file size. */ - long s = ftell (stream); - if (s >= 0) - alloc = s + 100; - fseek (stream, 0, SEEK_SET); -} - buffer = XNEWVEC (char, alloc); - - for (;;) -{ - size_t n = fread (buffer + base, 1, alloc - base - 1, stream); - - if (!n) - break; - base += n; - if (base + 1 == alloc) - { - alloc *= 2; - buffer = XRESIZEVEC (char, buffer, alloc); - } -} - buffer[base] = 0; - *plen = base; - return buffer; -} - /* Parse STR, saving found tokens into PVALUES and return their number. Tokens are assumed to be delimited by ':'. */ @@ -657,10 +619,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *); struct regcount *regcounts = XOBFINISH (®counts_os, struct regcount *); - fprintf (cfile, "#include \n"); - fprintf (cfile, "#include \n"); - fprintf (cfile, "#include \n\n"); - fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count); fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count); @@ -725,35 +683,28 @@ process_asm (FILE *in, FILE *out, FILE *cfile) /* Embed an object file into a C source file. */ static void -process_obj (FILE *in, FILE *cfile, uint32_t omp_requires) +process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires) { - size_t len = 0; - const char *input = read_file (in, &len); - /* Dump out an array containing the binary. - FIXME: do this with objcopy. */ - fprintf (cfile, "static unsigned char gcn_code[] = {"); - for (size_t i = 0; i < len; i += 17) -{ - fprintf (cfile, "\n\t"); - for (size_t j = i; j < i + 17 && j < len; j++) - fprintf (cfile, "%3u,", (unsigned char) input[j]); -} - fprintf (cfile, "\n};\n\n"); + If the file is empty, a parse error is shown as the argument to is_empty + is an undeclared identifier. */ + fprintf (cfile, + "static unsigned char gcn_code[] = {\n" + "#embed \"%s\" if_empty (error_file_is_empty)\n" + "};\n\n",
Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args
Hi PA, not yet a full review, but some observations: First: Please include the change gcc/fortran/types.def (BT_FN_PTR_CONST_PTR_INT) of "[PATCH v2 7/8] OpenMP: Fortran front-end support for dispatch + adjust_args" Do so either in this patch (3/8) - or in the previous (2/8) one that adds it to gcc/builtin-types.def. Otherwise this will break the build as omp-builtins.def (modified in this patch) is also used by gfortran. Causing intermittened build fails is bad - first, in general, and secondly it causes issues when bisecting. * * * If I try your testcase and move "bar" and "baz" *after* 'foo' and leave only the following before: int baz (double *d_bv, const double *d_av, int n); int bar (double *d_bv, const double *d_av, int n); it fails at runtime with: ERROR at 1: 0.00 (act) != 2.718280 (exp) as the two calls to __builtin_omp_get_mapped_ptr are now missing. With both the declaration and the definition before the declare target, it works. * * * I think this variant needs to be either supported – or an error has to be printed that it cannot be supported, but that would be rather unfortunate. Thanks, Tobias
Re: [PATCH v2 2/8] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces
Paul-Antoine Arras wrote: This patch introduces the OMP_DISPATCH tree node, as well as two new clauses `nocontext` and `novariants`. It defines/exposes interfaces that will be used in subsequent patches that add front-end and middle-end support, but nothing generates these nodes yet. LGTM. OFF TOPIC regarding "OMP_TRAIT_SET_NEED_DEVICE_PTR" and "pseudo-set selector used to convey argument list until variant has a decl": This reminds me vaguely of the issue that we should store the variant declarations with the base function and not with the variant, cf. https://gcc.gnu.org/PR113905 Thanks for the patch! Tobias It also adds support for new OpenMP context selectors: `dispatch` as trait selector and `need_device_ptr` as pseudo-trait set selector. The purpose of the latter is for the C++ front-end to store the list of arguments (that need to be converted to device pointers) until the declaration of the variant function becomes available. gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New. * omp-selectors.h (enum omp_tss_code): Add OMP_TRAIT_SET_NEED_DEVICE_PTR. (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (dump_generic_node): Handle OMP_DISPATCH. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (omp_clause_code_name): Add "novariants" and "nocontext". * tree.def (OMP_DISPATCH): New. * tree.h (OMP_DISPATCH_BODY): New macro. (OMP_DISPATCH_CLAUSES): New macro. (OMP_CLAUSE_NOVARIANTS_EXPR): New macro. (OMP_CLAUSE_NOCONTEXT_EXPR): New macro. --- gcc/builtin-types.def| 1 + gcc/omp-selectors.h | 3 +++ gcc/tree-core.h | 7 +++ gcc/tree-pretty-print.cc | 21 + gcc/tree.cc | 4 gcc/tree.def | 5 + gcc/tree.h | 7 +++ 7 files changed, 48 insertions(+) diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index c97d6bad1de..ef7aaf67d13 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, BT_FEXCEPT_T_PTR, DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT, BT_CONST_FEXCEPT_T_PTR, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, BT_UINT8) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR) diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h index c61808ec0ad..12bc9e9afa0 100644 --- a/gcc/omp-selectors.h +++ b/gcc/omp-selectors.h @@ -31,6 +31,8 @@ enum omp_tss_code { OMP_TRAIT_SET_TARGET_DEVICE, OMP_TRAIT_SET_IMPLEMENTATION, OMP_TRAIT_SET_USER, + OMP_TRAIT_SET_NEED_DEVICE_PTR, // pseudo-set selector used to convey argument +// list until variant has a decl OMP_TRAIT_SET_LAST, OMP_TRAIT_SET_INVALID = -1 }; @@ -55,6 +57,7 @@ enum omp_ts_code { OMP_TRAIT_CONSTRUCT_PARALLEL, OMP_TRAIT_CONSTRUCT_FOR, OMP_TRAIT_CONSTRUCT_SIMD, + OMP_TRAIT_CONSTRUCT_DISPATCH, OMP_TRAIT_LAST, OMP_TRAIT_INVALID = -1 }; diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 27c569c7702..508f5c580d4 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -542,6 +542,13 @@ enum omp_clause_code { /* OpenACC clause: nohost. */ OMP_CLAUSE_NOHOST, + + /* OpenMP clause: novariants (scalar-expression). */ + OMP_CLAUSE_NOVARIANTS, + + /* OpenMP clause: nocontext (scalar-expression). */ + OMP_CLAUSE_NOCONTEXT, + }; #undef DEFTREESTRUCT diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 4bb946bb0e8..752a402e0d0 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) case OMP_CLAUSE_EXCLUSIVE: name = "exclusive"; goto print_remap; +case OMP_CLAUSE_NOVARIANTS: + pp_string (pp, "novariants"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause)); + dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags, +false); + pp_right_paren (pp); + break; +case OMP_CLAUSE_NOCONTEXT: + pp_string (pp, "nocontext"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause)); + dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags, +false); + pp_right_paren (pp); + break; case OMP_CLAUSE__LOOPTEMP_: name = "_looptemp_"; goto print_remap; @@ -3947,6 +3963,11 @@ dump_generic_node
Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces
Hi Sandra, Am 16.07.24 um 19:03 schrieb Sandra Loosemore: Well, I still do not understand why backward compatibility concerns specific to some other directive should affect the ABI for a new directive that does not have any current libgomp runtime support, I am happy that I managed to explain you the background of the "-1" mess. Otherwise: The backward-compatibility hack is not required, but it has two advantages: consistency of the values used and it makes the code inside target.c way simpler by just using struct gomp_device_descr *devicep = resolve_device (device, true); instead of handling several additional cases. However, as written, avoiding the '(n == -1) ? -2 : n' code generation also has advantages; hence, I am also happy with that variant. (i.e. -2 or -3 denoting the default device). However, if you use -2 == default device, you need to fix the libgomp/target.c implementation as your code doesn't handle omp_default_device correctly, which 'resolve_device (device, true);' would handle automatically. you just tell me what ABI you want me to implement and I will re-do the code that way. Having looked at the code again – and in particular at libgomp/target.c, I realized the merits of using -2. Thus, at the end, I am happy with *either* variant. But either version requires some changes: One the creation of the conditional gimple code + much simplified code in target.c. And the other, keeping the current gimple code – but fixing/extending target.c. Tobias
Re: [PATCH v2 1/8] Fix warnings for tree formats in gfc_error
I think it would be nice if some C/C++/global maintainer could rubber stamp the following patch. Otherwise, I think it is trivial, i.e. I think it can be committed in a few days, unless someone has concerns. This change to gcc/c-family/c-format.cc LGTM from the *gfortran* POV and is trivially copied from gcc_tdiag_char_table or gcc_cdiag_char_table (which both have it). * * * Background: While this is for gcc/c-family/c-format.cc, the 'gcc_gfc_char_table' is for diagnostic for compiling gcc/fortran/, only. Namely, the gfc_error, gfc_warning etc. functions are annotated by the format checking attribute: #define ATTRIBUTE_GCC_GFC(m, n) __attribute__ ((__format__ (__gcc_gfc__, m, n))) ATTRIBUTE_NONNULL(m) * * * As gfc_error etc. call the common diagnostic at the end, '%qE', %qD' etc. are already supported. (As tested manually; it is also used by this patch series of PA.) But while %qE is already supported, without the 'gcc_gfc_char_table' change, the '__format__ (__gcc_gfc__' check does not recognize it and yields a -Werror, causing that a bootstrap fails. Hence, we need this patch … * * * Paul-Antoine Arras wrote: This enables proper warnings for formats like %qD. gcc/c-family/ChangeLog: * c-format.cc (gcc_gfc_char_table): Add formats for tree objects. --- gcc/c-family/c-format.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc index 5bfd2fc4469..f4163c9cbc0 100644 --- a/gcc/c-family/c-format.cc +++ b/gcc/c-family/c-format.cc @@ -847,6 +847,10 @@ static const format_char_info gcc_gfc_char_table[] = /* This will require a "locus" at runtime. */ { "L", 0, STD_C89, { T89_V, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "", "R", NULL }, + /* These will require a "tree" at runtime. */ + { "DFTV", 1, STD_C89, { T89_T, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q+", "'", NULL }, + { "E", 1, STD_C89, { T89_T, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q+", "", NULL }, + /* These will require nothing. */ { "<>",0, STD_C89, NOARGUMENTS, "", "", NULL }, { NULL, 0, STD_C89, NOLENGTHS, NULL, NULL, NULL
Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces
Hi Sandra, Sandra Loosemore wrote: + /* omp_initial_device is -1, omp_invalid_device is -4; choose + a value that isn't otherwise defined to indicate the default + device. */ + device_num = build_int_cst (integer_type_node, -2); Don't do this - we do it differently for 'target' and it should do the same. Some value usage history: Without caring for backward compatibility, I think we had somewhere #define OMP_DEFAULT_DEVICE -2 and would simply use it everywhere when doing API calls. But to handle old code, we have to handle both: -1 → default device and -1 → initial device (= host). Before coming back to your code, let's try to explain the history and reason again. Maybe I manage to explain it better this time: * * * The problem is that -1 on the user side and -1 on the internal-use side mean different things. Namely: In the old days OpenMP had on the user side: device numbers 0 ... omp_get_num_devices() where the upper bound was the initial device (= host), omp_get_initial_device(). For omp target num_device(n) the device number has to be passed to the run time – and GCC just passes "n" here. But GCC also needs to handle: omp target i.e. not specifying a device number (= using the default device). It has been implemented in the obvious way, i.e. passing '-1'. Later, OpenMP added: omp_initial_device == -1 omp_invalid_device (negative, implementation defined, != omp_initial_device) GCC set the latter rather arbitrary to -4. RESULT: Everything works fine, except for -1 as omp target device_num(omp_initial_device) and omp target are now the same, but semantically one uses the host and the other the default device. Therefore, GCC uses: (A) API routines - use omp_initial_device == -1 as value. (B) Directives - use -1 for no clause (= backward compatible), using the default device. Using -2 for omp_initial_device. Hence, the following defines exist: #define GOMP_DEVICE_ICV -1 #define GOMP_DEVICE_HOST_FALLBACK -2 #define GOMP_DEVICE_INVALID -4 If you call an OpenMP runtime API routine, you need to use -1 for the initial device and for GOMP_* functions related to directives -2 using GOMP_DEVICE_HOST_FALLBACK, when constructing it manually. Code wise, GCC handles num_device(n) by generating code like: if !num_device devnum = GOMP_DEVICE_ICV; else devnum = (n == -1) ? GOMP_DEVICE_HOST_FALLBACK : n; That's not ideal but one solution to handle backward compatibility. Inside libgomp/target.c, there is: resolve_device (int device_id, bool remapped) and 'remapped' is - 'false' for OpenMP API routines and - 'true' for GOMP_* calls. The following code in resolve_device does then undo the '-1': if (remapped && device_id == GOMP_DEVICE_ICV) device_id = icv->default_device_var; remapped = false; if (device_id < 0) if (device_id == (remapped ? GOMP_DEVICE_HOST_FALLBACK : omp_initial_device)) return NULL; * * * Now coming back to your code: If you call resolve_device directly, using the GOMP_* variant makes sense, i.e. passing the device number as is with 'remap = true'. This also makes sense for consistency with the remaining code. Downside: This requires to add (n == -1) ? -2 : n for user-specified 'n'. If you handle the device_num resolution yourself in libgomp, you have two variants to chose from: (a) using a different value to denote the default-device (e.g. '-2' or '-3') and pass it as is (b) call resolve_device with remapping in libgomp, but handling -1 for the default device as '(n == -1) ? -2 : n' during code gen I think either works - and either variant is confusing in one way or the other. * * * Jumping to: [PATCH v2 03/12] libgomp: runtime support for target_device selector libgomp/target.c: +bool +GOMP_evaluate_target_device (int device_num, const char *kind, +const char *arch, const char *isa) +{ If you do the remapping, you could just use: struct gomp_device_descr *devicep = resolve_device (device, true); if (kind && strcmp (kind, "any") == 0) kind = NULL; if (devicep == NULL) result = GOMP_evaluate_current_device (kind, arch, isa); else result = device->evaluate_device_func (device_num, kind, arch, isa); which seems to be simpler than the code you have. If you don't do the remapping: + bool result = true; + + /* -2 is a magic number to indicate the device number was not specified; + in that case it's supposed to use the default device. */ + if (device_num == -2) +device_num = omp_get_default_device (); … then you need to handle -2 yourself. + if (kind && strcmp (kind, "any") == 0) +kind = NULL; + + gomp_debug (1, "%s: device_num = %u, kind=%s, arch=%s, isa=%s", + __FUNCTION__, device_num, kind, arch, isa); + + if (omp_get_device_num () == device_num) +result = GOMP_evaluate_current_device (kind, arch
x86_64-gnu-linux bootstrap fail (was: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw)
Hi Evgeny, I am not sure whether I have chosen the right email in the thread but: a x86-64 GNU Linux build currently fails as follows. At a glance, it seems to be sufficient to remove the prototype declaration in i386.cc. Namely: gcc/config/i386/i386.cc:107:12: error: 'rtx_def* legitimize_dllimport_symbol(rtx, bool)' declared 'static' but never defined [-Werror=unused-function] 107 | static rtx legitimize_dllimport_symbol (rtx, bool); |^~~ gcc/gcc/config/i386/i386.cc:108:12: error: 'rtx_def* legitimize_pe_coff_extern_decl(rtx, bool)' declared 'static' but never defined [-Werror=unused-function] 108 | static rtx legitimize_pe_coff_extern_decl (rtx, bool); |^~ ^Cmake[3]: *** [Makefile:2556: i386.o] Interrupt There is: config/i386/i386.cc:static rtx legitimize_dllimport_symbol (rtx, bool); config/mingw/winnt-dll.cc:legitimize_dllimport_symbol (rtx symbol, bool want_reg) config/mingw/winnt-dll.cc: return legitimize_dllimport_symbol (addr, inreg); config/mingw/winnt-dll.cc:rtx t = legitimize_dllimport_symbol (XEXP (XEXP (addr, 0), 0), inreg); And: config/i386/i386.cc:static rtx legitimize_pe_coff_extern_decl (rtx, bool); config/mingw/winnt-dll.cc:legitimize_pe_coff_extern_decl (rtx symbol, bool want_reg) config/mingw/winnt-dll.cc:return legitimize_pe_coff_extern_decl (addr, inreg); config/mingw/winnt-dll.cc: rtx t = legitimize_pe_coff_extern_decl (XEXP (XEXP (addr, 0), 0), inreg); Tobias
[Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file
[I messed up copying from the build system, picking up an old version. Changes to v1 (bottom of the diff): fopen is no longer required.] Tobias Burnus wrote: mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's #embed patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html It might be a tiny bit faster than currently (or not); however, once #embed has a large-file mode, it should also speed up the offloading compilation quit a bit. OK for mainline, once '#embed' support is in? Tobiasgcn/mkoffload.cc: Use #embed for including the generated ELF file gcc/ChangeLog: * config/gcn/mkoffload.cc (read_file): Remove. (process_obj): Generate C file that uses #embed. (main): Update call to it; remove no longer needed file I/O. gcc/config/gcn/mkoffload.cc | 72 - 1 file changed, 12 insertions(+), 60 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 810298a799b..0c840318b2d 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -182,44 +182,6 @@ xputenv (const char *string) putenv (CONST_CAST (char *, string)); } -/* Read the whole input file. It will be NUL terminated (but - remember, there could be a NUL in the file itself. */ - -static const char * -read_file (FILE *stream, size_t *plen) -{ - size_t alloc = 16384; - size_t base = 0; - char *buffer; - - if (!fseek (stream, 0, SEEK_END)) -{ - /* Get the file size. */ - long s = ftell (stream); - if (s >= 0) - alloc = s + 100; - fseek (stream, 0, SEEK_SET); -} - buffer = XNEWVEC (char, alloc); - - for (;;) -{ - size_t n = fread (buffer + base, 1, alloc - base - 1, stream); - - if (!n) - break; - base += n; - if (base + 1 == alloc) - { - alloc *= 2; - buffer = XRESIZEVEC (char, buffer, alloc); - } -} - buffer[base] = 0; - *plen = base; - return buffer; -} - /* Parse STR, saving found tokens into PVALUES and return their number. Tokens are assumed to be delimited by ':'. */ @@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile) /* Embed an object file into a C source file. */ static void -process_obj (FILE *in, FILE *cfile, uint32_t omp_requires) +process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires) { - size_t len = 0; - const char *input = read_file (in, &len); - /* Dump out an array containing the binary. FIXME: do this with objcopy. */ - fprintf (cfile, "static unsigned char gcn_code[] = {"); - for (size_t i = 0; i < len; i += 17) -{ - fprintf (cfile, "\n\t"); - for (size_t j = i; j < i + 17 && j < len; j++) - fprintf (cfile, "%3u,", (unsigned char) input[j]); -} - fprintf (cfile, "\n};\n\n"); + fprintf (cfile, + "static unsigned char gcn_code[] = {\n" + "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n" + "#embed \"%s\"\n" + "#else\n" + "#error \"#embed '%s' failed\"\n" + "#endif\n" + "};\n\n", fname_in, fname_in, fname_in); fprintf (cfile, "static const struct gcn_image {\n" " size_t size;\n" " void *image;\n" "} gcn_image = {\n" - " %zu,\n" + " sizeof(gcn_code),\n" " gcn_code\n" - "};\n\n", - len); + "};\n\n"); fprintf (cfile, "static const struct gcn_data {\n" @@ -1312,13 +1270,7 @@ main (int argc, char **argv) fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args"); obstack_free (&ld_argv_obstack, NULL); - in = fopen (gcn_o_name, "r"); - if (!in) - fatal_error (input_location, "cannot open intermediate gcn obj file"); - - process_obj (in, cfile, omp_requires); - - fclose (in); + process_obj (gcn_o_name, cfile, omp_requires); xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL)); xputenv (concat ("COMPILER_PATH=", cpath, NULL));
[Patch] gcn/mkoffload.cc: Use #embed for including the generated ELF file
mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's #embed patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html It might be a tiny bit faster than currently (or not); however, once #embed has a large-file mode, it should also speed up the offloading compilation quit a bit. OK for mainline, once '#embed' support is in? Tobias gcn/mkoffload.cc: Use #embed for including the generated ELF file gcc/ChangeLog: * config/gcn/mkoffload.cc (read_file): Remove. (process_obj): Generate C file that uses #embed. (main): Update call to it; remove no longer needed file I/O. gcc/config/gcn/mkoffload.cc | 66 + 1 file changed, 12 insertions(+), 54 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 810298a799b..0ccb874398a 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -182,44 +182,6 @@ xputenv (const char *string) putenv (CONST_CAST (char *, string)); } -/* Read the whole input file. It will be NUL terminated (but - remember, there could be a NUL in the file itself. */ - -static const char * -read_file (FILE *stream, size_t *plen) -{ - size_t alloc = 16384; - size_t base = 0; - char *buffer; - - if (!fseek (stream, 0, SEEK_END)) -{ - /* Get the file size. */ - long s = ftell (stream); - if (s >= 0) - alloc = s + 100; - fseek (stream, 0, SEEK_SET); -} - buffer = XNEWVEC (char, alloc); - - for (;;) -{ - size_t n = fread (buffer + base, 1, alloc - base - 1, stream); - - if (!n) - break; - base += n; - if (base + 1 == alloc) - { - alloc *= 2; - buffer = XRESIZEVEC (char, buffer, alloc); - } -} - buffer[base] = 0; - *plen = base; - return buffer; -} - /* Parse STR, saving found tokens into PVALUES and return their number. Tokens are assumed to be delimited by ':'. */ @@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile) /* Embed an object file into a C source file. */ static void -process_obj (FILE *in, FILE *cfile, uint32_t omp_requires) +process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires) { - size_t len = 0; - const char *input = read_file (in, &len); - /* Dump out an array containing the binary. FIXME: do this with objcopy. */ - fprintf (cfile, "static unsigned char gcn_code[] = {"); - for (size_t i = 0; i < len; i += 17) -{ - fprintf (cfile, "\n\t"); - for (size_t j = i; j < i + 17 && j < len; j++) - fprintf (cfile, "%3u,", (unsigned char) input[j]); -} - fprintf (cfile, "\n};\n\n"); + fprintf (cfile, + "static unsigned char gcn_code[] = {\n" + "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n" + "#embed \"%s\"\n" + "#else\n" + "#error \"#embed '%s' failed\"\n" + "#endif\n" + "};\n\n", fname_in, fname_in, fname_in); fprintf (cfile, "static const struct gcn_image {\n" " size_t size;\n" " void *image;\n" "} gcn_image = {\n" - " %zu,\n" + " sizeof(gcn_code),\n" " gcn_code\n" - "};\n\n", - len); + "};\n\n"); fprintf (cfile, "static const struct gcn_data {\n" @@ -1316,7 +1274,7 @@ main (int argc, char **argv) if (!in) fatal_error (input_location, "cannot open intermediate gcn obj file"); - process_obj (in, cfile, omp_requires); + process_obj (gcn_o_name, cfile, omp_requires); fclose (in);
[Patch] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]
Hi all, it turned out that 'declare target' with 'link' clause was broken in multiple ways. The main fix is the attached patch, i.e. namely pushing the variables already to the offload-vars list already in the FE. When implementing it, I noticed: * C has a similar issue when using nested functions, which is a GNU extension →https://gcc.gnu.org/115574 * When doing partial mapping of arrays (which is one of the reasons for 'link'), offsets are mishandled in Fortran (not tested in C), see FIXME in the patch) There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19. (I will file a PR about this). * It might happen that linked variables do not get linked. I have not investigated why, but 'arr2' gives link errors – while 'arr' works. See FIXME in the patch. (I will file a PR about this) * For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577 * When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c', it fails to link the device side as the 'mycom_' symbol cannot be found on the device side. (I will file a PR about this) As COMMON as issues, an alternative would be to defer the trans-common.cc changes to a later patch. Comments, questions, concerns? Tobias PS: Tested with nvptx offloading with a page-migration supporting system with nvptx and GCN offloading configured and no new fails observed. OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555] Contrary to a normal 'declare target', the 'declare target link' attribute also needs to set node->offloadable and push the offload_vars in the front end. Linked variables require that the data is mapped. For module variables, this can happen anywhere. For variables in an external subprograms or the main programm, this can only happen in the either that program itself or in an internal subprogram. - Whether a variable is just normally mapped or linked then becomes relevant if a device routine exists that can access that variable, i.e. an internal procedure has then to be marked as declare target. PR fortran/115559 gcc/fortran/ChangeLog: * trans-common.cc (build_common_decl): Add 'omp declare target' and 'omp declare target link' variables to offload_vars. * trans-decl.cc (add_attributes_to_decl): Likewise; update args and call decl_attributes. (get_proc_pointer_decl, gfc_get_extern_function_decl, build_function_decl): Update calls. (gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1' to avoid errors with symtab_node::get_create. libgomp/ChangeLog: * testsuite/libgomp.fortran/declare-target-link.f90: New test. gcc/fortran/trans-common.cc| 21 gcc/fortran/trans-decl.cc | 81 +- .../libgomp.fortran/declare-target-link.f90| 119 + 3 files changed, 195 insertions(+), 26 deletions(-) diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc index 5f44e7bd663..e714342c3c0 100644 --- a/gcc/fortran/trans-common.cc +++ b/gcc/fortran/trans-common.cc @@ -98,6 +98,9 @@ along with GCC; see the file COPYING3. If not see #include "coretypes.h" #include "tm.h" #include "tree.h" +#include "cgraph.h" +#include "context.h" +#include "omp-offload.h" #include "gfortran.h" #include "trans.h" #include "stringpool.h" @@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init) = tree_cons (get_identifier ("omp declare target"), omp_clauses, DECL_ATTRIBUTES (decl)); + if (com->omp_declare_target_link || com->omp_declare_target) + { + /* Add to offload_vars; get_create does so for omp_declare_target, + omp_declare_target_link requires manual work. */ + gcc_assert (symtab_node::get (decl) == 0); + symtab_node *node = symtab_node::get_create (decl); + if (node != NULL && com->omp_declare_target_link) + { + node->offloadable = 1; + if (ENABLE_OFFLOADING) + { + g->have_offload = true; + if (is_a (node)) + vec_safe_push (offload_vars, decl); + } + } + } + /* Place the back end declaration for this common block in GLOBAL_BINDING_LEVEL. */ gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl); diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 8d4f06a4e1d..4067dd6ed77 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -46,7 +46,9 @@ along with GCC; see the file COPYING3. If not see #include "trans-stmt.h" #include "gomp-constants.h" #include "gimplify.h" +#include "context.h" #include "omp-general.h" +#include "omp-offload.h" #include "attr-fnspec.h" #include "tree-iterator.h" #include "dependency.h" @@ -1470,19 +1472,18 @@ gfc_add_assign_aux_vars (gfc_symbol * sym) } -static tree -add_attributes_to_decl (symbol_attribute sym_attr, tree list) +static void +add_attributes_to_decl (tree *decl_p, const gfc_symbol *sym) { unsigned id; - tree attr; + tree list = NUL
Re: [PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc
Andrew Stubbs wrote: Compared to the previous v4 (1/5) posting of this patch: - The enumeration of the ompx allocators have been moved (again) to 200 (as 100 is already in use by another toolchain vendor and this seems like a possible source of confusion). - The "ompx" has also been changed to "ompx_gnu" to highlight that these are specifically GNU extensions. - The failure mode of the testcases had been modified, including adding an abort in CHECK_SIZE and skipping the test on unsupported platforms. - The OMP_ALLOCATE environment variable now supports the new allocator. - The Fortran frontend allows use of the new allocator in "allocator" clauses. --- This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx" namespace and an independent enum baseline of 200 (selected to not clash with other known implementations). The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. One motivation for having this feature is for use by the (planned) -foffload-memory=pinned feature. The patch LGTM. Thanks! Tobias gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Update valid ranges to incorporate ompx_gnu_pinned_mem_alloc. libgomp/ChangeLog: * allocator.c (ompx_gnu_min_predefined_alloc): New. (ompx_gnu_max_predefined_alloc): New. (predefined_alloc_mapping): Rename to ... (predefined_omp_alloc_mapping): ... this. (predefined_ompx_gnu_alloc_mapping): New. (_Static_assert): Adjust for the new name, and add a new assert for the new table. (predefined_allocator_p): New. (predefined_alloc_mapping): New. (omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc. Use predefined_allocator_p and predefined_alloc_mapping. (omp_free): Likewise. (omp_alligned_calloc): Likewise. (omp_realloc): Likewise. * env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc. * libgomp.texi: Document ompx_gnu_pinned_mem_alloc. * omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc. * omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-pinned-1.f90: New test. Co-Authored-By: Thomas Schwinge --- gcc/fortran/openmp.cc | 11 +- .../gfortran.dg/gomp/allocate-pinned-1.f90| 16 +++ libgomp/allocator.c | 115 +- libgomp/env.c | 1 + libgomp/libgomp.texi | 7 +- libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in| 2 + libgomp/omp_lib.h.in | 2 + libgomp/testsuite/libgomp.c/alloc-pinned-5.c | 100 +++ libgomp/testsuite/libgomp.c/alloc-pinned-6.c | 102 .../libgomp.fortran/alloc-pinned-1.f90| 16 +++ 11 files changed, 336 insertions(+), 37 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
Re: [PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode
Andrew Stubbs wrote: The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLog: * testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to dg-skip-if. Correct spelling mistake. Abort on insufficient lockable memory. Use #error on non-linux hosts. * testsuite/libgomp.c/alloc-pinned-2.c: Likewise. LGTM. Thanks! Tobias
Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak
Andre Vehreschild wrote: PS That's good news about the funding. Maybe we will get to see "built in" coarrays soon? You hopefully will see Nikolas work on the shared memory coarray support, if that is what you mean by "built in" coarrays. I will be working on the distributed memory coarray support esp. fixing the module issues and some other team related things. Cool! (Both of it.) I assume "distributed memory coarray support" is still based on Open Coarrays? * * * I am asking because there is coarray API being defined: Parallel Runtime Interface for Fortran (PRIF), https://go.lbl.gov/prif with an implementation called Caffeine – CoArray Fortran Framework of Efficient Interfaces to Network Environments, https://crd.lbl.gov/caffeine which uses GASNet or POSIX processes. Well, the among the implementers is (unsurprising?) Damian – and the idea seems to be that LLVM's FLANG will use the API. Tobias PS: I think it might be useful in the long run to support both PRIF/Caffeine and OpenCoarrays. I have attached my hello-world patch for -fcoarray=prif that I wrote after ISC-HPC; it only handles this_image() / num_images() + init/stop. I got confirmation by the PRIF developers that the next revision will permit calling __prif_MOD_prif_init multiple times such that one can use it in the constructor for static coarrays, which won't work otherwise. gcc/ChangeLog: * flag-types.h (enum gfc_fcoarray): gcc/fortran/ChangeLog: * invoke.texi: * lang.opt: * trans-decl.cc (gfc_build_builtin_function_decls): (create_main_function): * trans-intrinsic.cc (trans_this_image): (trans_num_images): * trans.h (GTY): gcc/flag-types.h | 3 ++- gcc/fortran/invoke.texi| 7 +- gcc/fortran/lang.opt | 5 +++- gcc/fortran/trans-decl.cc | 56 -- gcc/fortran/trans-intrinsic.cc | 42 +++ gcc/fortran/trans.h| 5 6 files changed, 108 insertions(+), 10 deletions(-) diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 5a2b461fa75..babd747c01d 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -427,7 +427,8 @@ enum gfc_fcoarray { GFC_FCOARRAY_NONE = 0, GFC_FCOARRAY_SINGLE, - GFC_FCOARRAY_LIB + GFC_FCOARRAY_LIB, + GFC_FCOARRAY_PRIF }; diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi index 40e8e4a7cdd..331a40d31db 100644 --- a/gcc/fortran/invoke.texi +++ b/gcc/fortran/invoke.texi @@ -1753,7 +1753,12 @@ Single-image mode, i.e. @code{num_images()} is always one. @item @samp{lib} Library-based coarray parallelization; a suitable GNU Fortran coarray -library needs to be linked. +library needs to be linked such as @url{http://opencoarrays.org}. + +@item @samp{prif} +Using the Parallel Runtime Interface for Fortran (PRIF), +@url{https://go.lbl.gov/@/prif}; for instance, via Caffeine, +@url{https://go.lbl.gov/@/caffeine}. @end table diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index 5efd4a0129a..9ba957d5571 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -786,7 +786,7 @@ Copy array sections into a contiguous block on procedure entry. fcoarray= Fortran RejectNegative Joined Enum(gfc_fcoarray) Var(flag_coarray) Init(GFC_FCOARRAY_NONE) --fcoarray= Specify which coarray parallelization should be used. +-fcoarray= Specify which coarray parallelization should be used. Enum Name(gfc_fcoarray) Type(enum gfc_fcoarray) UnknownError(Unrecognized option: %qs) @@ -800,6 +800,9 @@ Enum(gfc_fcoarray) String(single) Value(GFC_FCOARRAY_SINGLE) EnumValue Enum(gfc_fcoarray) String(lib) Value(GFC_FCOARRAY_LIB) +EnumValue +Enum(gfc_fcoarray) String(prif) Value(GFC_FCOARRAY_PRIF) + fcheck= Fortran RejectNegative JoinedOrMissing -fcheck=[...] Specify which runtime checks are to be performed. diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index dca7779528b..d1c0e2ee997 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -170,6 +170,10 @@ tree gfor_fndecl_co_sum; tree gfor_fndecl_caf_is_present; tree gfor_fndecl_caf_random_init; +tree gfor_fndecl_prif_init; +tree gfor_fndecl_prif_stop; +tree gfor_fndecl_prif_this_image_no_coarray; +tree gfor_fndecl_prif_num_images; /* Math functions. Many other math functions are handled in trans-intrinsic.cc. */ @@ -4147,6 +4151,31 @@ gfc_build_builtin_function_decls (void) get_identifier (PREFIX("caf_random_init")), void_type_node, 2, logical_type_node, logical_type_node); } + else if (flag_coarray == GFC_FCOARRAY_PRIF) +{ + tree pint_type = build_pointer_type (integer_type_node); + tree pbool_type = build_pointer_type (boolean_type_node); + tree pintmax_type_node = get_typenode_from_name (INTMAX_TYPE); + pintmax_type_node = build_pointer_type (pintmax_type_node); + + gfor_fndecl_prif_init = gfc_build_library_function_decl_with_spec ( + get_identifier ("__prif_MOD_prif_init"), ". W ", + void_type_node, 1, pint
Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features
Hi Gerald, Gerald Pfeifer wrote: Looks like a janitorial task to fix the absolute links, possibly excluding those with /git, /onlinedocs, /wiki – or assuming that the main page is GCC.gnu.org, relying on the redirects. It's on my list. A first quick check indicates there isn't much to do, though. :-) You could consider htdocs/search.html: to avoid a redirect (but it is not a broken link); otherwise, I but I concur that it seems to be (mostly) fine :-) * * * + loop-transformation constructs are now supported. I'm thinking "loop transformation" in English? Or is this a specific term from the standard? Loop transformation happens at the end. But e.g "(#pragma omp) unroll full" is a directive and, e.g. ... is a construct (= directive + structured block (if any) + end directive (if any)). I believe there was a misunderstanding and I wasn't clear enough: I was wondering whether instead of "loop-transformation" the patch should have "loop transformation". In your response you use the version without dash, so I guess we agree? :-) (Pedantically it's a hyphen (-) and not a(n en/em) dash (–/—), i.e. '-' not '--' or '---' in TeX.) No, we don't. – There is a difference whether the two words are used alone or as modifier to a noun, like the "this is well defined" vs. "a well-defined project". Thus, while "loop transformation happens" is without hyphen (as we both agree),* for "loop(-| )tranformation constructs" the (non-)usage of hyphens is not well defined; grouping wise, those are clearly '((loop transformation) constructs)' and not '(loop (transformation constructs))'. I believe both variants are perfectly fine. BTW: In the OpenMP pre-6.0 draft (TR12), the verb 'transform' is now used as noun not with suffix '-ation' but with the suffix '-ing' (also referred to as gerund) such that a section title now uses "Loop-Transforming Constructs"; I think for '(word) plus (-ing word)' – used as modifier –, a hyphen is a tad more common than for '(word) plus '(word with -ation suffix)'. Tobias * The Oxford Guide to Style points out some words that do get hyphenated: clear-cut, drip-proof, take-off, part-time, … – or to refer to the abstract meaning rather than literal: bull's-eye, crow's-feet, … — Formerly, present particle plus noun got hyphenated when the compound was acted on: walking-stick, walking-frame. Likewise, it was formerly normal in British English to hyphenate a single adjectival noun and the noun it modified: note-cue, title-page, volume-number (less common now, but can linger in some combination). And until recently: small scale-factory (vs. small-scale factory), white water-lily (vs. white-water lily).
Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features
Hi Gerald, Gerald Pfeifer wrote: +++ b/htdocs/gcc-15/changes.html + + https://gcc.gnu.org/projects/gomp/";>OpenMP Can you please make this a relative link, i.e. "../projects/gomp/"? Good point. I thought such links should be absolute because of (www.)GNU.org, i.e. https://www.gnu.org/software/gcc/releases.html ... but also that page has https://www.gnu.org/software/gcc/projects/gomp/ GNU.org does not have the documentation, but going to https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302 temporary redirect) to the GCC website. Likewise for '../git' but for '../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works. I think there are plenty of links which could be relative ones but are absolute ones. Looks like a janitorial task to fix the absolute links, possibly excluding those with /git, /onlinedocs, /wiki – or assuming that the main page is GCC.gnu.org, relying on the redirects. In any case, those links are probably broken on GNU.org: htdocs/gcc-14/porting_to.html:href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma GCC diagnostic warning htdocs/gcc-5/changes.html: A href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual * * * + + OpenMP 5.1: The unroll and tile + loop-transformation constructs are now supported. + I'm thinking "loop transformation" in English? Or is this a specific term from the standard? Loop transformation happens at the end. But e.g "(#pragma omp) unroll full" is a directive and, e.g. #pragma omp unroll partial(2) for (int i=0; i < n; i++) a[i] = 5; is a construct (= directive + structured block (if any) + end directive (if any)). Tobias
Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode
Sandra Loosemore wrote: On 6/6/24 06:06, Tobias Burnus wrote: +@item I/O within OpenMP target regions and OpenACC compute regions is supported + using the C library @code{printf} functions. + Additionally, the Fortran @code{print}/@code{write} statements are + supported within OpenMP target regions, but not yet OpenACC compute + regions. @c The latter needs 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'. I think an "in" (or 'within') is missing before OpenACC. Yes, "...not yet within OpenACC compute regions", please. Thanks! Committed as https://gcc.gnu.org/r15-1072-g423522aacd9f30 Tobias
Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode
Hi Thomas, regarding the commit r15-1070-g3a4775d4403f2e / https://gcc.gnu.org/r15-1070 First, thanks for adding I/O support to nvptx offloading. I have a wording nit, to be confirmed by a native speaker: --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi ... +@item I/O within OpenMP target regions and OpenACC compute regions is supported + using the C library @code{printf} functions. + Additionally, the Fortran @code{print}/@code{write} statements are + supported within OpenMP target regions, but not yet OpenACC compute + regions. @c The latter needs 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'. I think an "in" (or 'within') is missing before OpenACC. Otherwise, it seemed to fine at a glance – and I am happy that that feature now finally works :-) Hooray, no longer using reverse offload ("!$omp target device(ancestor:1)") for Fortran I/O when debugging. Thanks, Tobias
Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc
Hi Andrew, hi Jakub, hello world, Andrew Stubbs wrote: Compared to the previous v3 posting of this patch, the enumeration of the "ompx" allocators have been moved to start at "100" 100 is a bad value - as can be seen below. As Jakub suggested at https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html "given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150" (I know that the first review email suggested 100.) This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. Namely: ompx_pinned_mem_alloc RFC: Should we use this name or - similar to LLVM - prefix this by a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)? IMHO it is fine to use ompx_ for pinned as the semantic is clear and should be compatible with IBM and AMD. For other additional memspaces / allocators, I am less sure, i.e. on OG13 there are: - ompx_unified_shared_mem_space, ompx_host_mem_space - ompx_unified_shared_mem_alloc, ompx_host_mem_alloc (BTW: In light of TR13 naming, the USM one could be ..._devices_all_mem_{alloc,space}, just to start some bikeshading or following LLVM + Intel '…target_{host,shared}…'.) * * * Looking at other compilers: IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , has: - ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without documenting it further Checking omp.h, they define it as: ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */ and additionally have: LOMP_MAX_MEM_ALLOC = 1024, AMD's compiler based on clang has: /* Preview of pinned memory support */ ompx_pinned_mem_alloc = 120, in addition to the LLVM defines shown below. Regarding LLVM: - they don't offer 'pinned' - they use the prefix 'llvm_omp' not 'ompx' Namely: typedef enum omp_allocator_handle_t ... llvm_omp_target_host_mem_alloc = 100, llvm_omp_target_shared_mem_alloc = 101, llvm_omp_target_device_mem_alloc = 102, ... typedef enum omp_memspace_handle_t ... llvm_omp_target_host_mem_space = 100, llvm_omp_target_shared_mem_space = 101, llvm_omp_target_device_mem_space = 102, Remark: I did not find a documentation - and while I understand in principle host and shared, I wonder how LLVM handles 'device_mem_space' when there is more than one device. BTW: OpenMP TR13 avoids this issue by adding two sets of API routines. Namely: First, for memspaces, - omp_get_{device,devices}_memspace - omp_get_{device,devices}_and_host_memspace - omp_get_devices_all_memspace and, secondly, for allocators: - omp_get_{device,devices}_allocator - omp_get_{device,devices}_and_host_allocator - omp_get_devices_all_allocator where omp_get_device_* takes a single device number and omp_get_devices_* a list of device numbers while _and_host automatically adds the initial device to the list. * * * Looking at Intel, they even use extensions without prefix: omp_target_{host,shared,device}_mem_{space,alloc} and contrary to LLVM they document it with the semantic, cf. https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html * * * The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. ... diff --git a/libgomp/allocator.c b/libgomp/allocator.c index cdedc7d80e9..18e3f525ec6 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr) ... #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) -_Static_assert (ARRAY_SIZE (predefined_alloc_mapping) +_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping) == omp_max_predefined_alloc + 1, - "predefined_alloc_mapping must match omp_memspace_handle_t"); + "predefined_omp_alloc_mapping must match omp_memspace_handle_t"); +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) I am surprised that this compiles: Why do you re-#define this macro? * * * --- a/libgomp/omp.h.in +++ b/libgomp/omp.h.in @@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM omp_cgroup_mem_alloc = 6, omp_pteam_mem_alloc = 7, omp_thread_mem_alloc = 8, + ompx_pinned_mem_alloc = 100, See remark regarding "100" at the top of this email. --- a/libgomp/omp_lib.f90.in +++ b/libgomp/omp_lib.f90.in +integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_pinned_mem_alloc = 100 Likewise. * * * Why didn't you also update omp_lib.h.in? * * * I think you really want to update the checking code inside GCC itself, i.e. for Fortran: 3 | !$omp allocate(a) allocator(100) | 21 Error: Predefined allocator required in ALLOCATOR clause at (1) as the list item 'a' at (2) has the SAV
[wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features
GCC 15 now supports unified-shared memory and the tile/unroll constructs in OpenMP. Updates https://gcc.gnu.org/gcc-15/changes.html and https://gcc.gnu.org/projects/gomp/ Comments? Tobias gcc-15/changes.html + projects/gomp: update for new OpenMP features GCC 15 now supports unified-shared memory and the tile/unroll constructs in OpenMP. htdocs/gcc-15/changes.html | 27 ++- htdocs/projects/gomp/index.html | 11 +++ 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index b59fd3be..94528ebd 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -40,6 +40,24 @@ a work-in-progress. New Languages and Language specific improvements + + https://gcc.gnu.org/projects/gomp/";>OpenMP + + + Support for unified-shared memory has been added for some AMD and Nvidia + GPUs devices, enabled only when using the + unified_shared_memory clause to the requires + directive. For details, see the offload-target specifics section in the + https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html"; + >GNU Offloading and Multi Processing Runtime Library Manual. + + + OpenMP 5.1: The unroll and tile + loop-transformation constructs are now supported. + + + + diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 94bda5ff..d1765fc3 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and optimizations. requires directive - + GCC 9 GCC 12 GCC 13 - GCC 14 + GCC 14 + GCC 15 (atomic_default_mem_order) (dynamic_allocators) complete but no non-host devices provides unified_address or unified_shared_memory - complete but no non-host devices provides unified_shared_memory + complete but no non-host devices provides unified_shared_memory + complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";> + Offload-Target Specifics @@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and optimizations. Loop transformation constructs -No +GCC 15