OpenMP Patch Ping – including "[13 Regression]" patches
"[13 Regression]" OpenMP Fortran patches: [Patch] OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610531.html [Patch][v2] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610584.html Additionally, there several more patches pending, see below. Of those: The first two small ones are very simple; especially the first one I regard as obvious! The third one is a documentation patch. The others are of varying complexity but I think some would be still suitable for the current stage, including some which have been pinged since October :-( Tobias PS: The mentioned patches: On 10.01.23 12:37, Tobias Burnus wrote: Hi all, hello Jakub, Below is the updated list to last ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607178.html NOTE to the list below: I have stopped checking older patches. I know some more are pending review, others need to be revised. I will re-check, once the below listed patches have been reviewed. Cf. old list. Thanks for the reviews done in between the last ping and now! * * * Small patches = * [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html Tue Dec 13 16:38:22 GMT 2022 * [Patch] OpenMP: Parse align clause in allocate directive in C/C++ https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html Tue Dec 13 17:44:27 GMT 2022 * Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads) https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608245.html Thu Nov 24 12:01:04 GMT 2022 (Side note: wwwdocs also needs to be updated for the latter patch and some other patches done in the meanwhile.) Fortran allocat(e,ors) prep patch = * [Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive (was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive) https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html Wed Dec 21 15:51:25 GMT 2022 (Remark: While written from scratch, it is kind of a follow-up to Abid's patch [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0) you/Jakub reviewed on Tue Oct 11 12:13:14 GMT 2022, i.e. https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html - For the actual implementation of 'allocators', we still have to solve the issues raised in the review for '[PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).'. at https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603279.html (and earlier in the thread); implementing 'omp allocate' (Fortran/C/C++) seems to be easier but no one has started implementing it so far - only parsing support exists. - The USM patches on semi-USM system run into a similar issue as 'allocators' and for it, some ME omp_allocate is added.) Mapping related patches === (Complex but GCC needs a revision badly as it fixing several bugs and missing functionality.) * Complete patch set was just re-submitted by Julian, overiew patch is [PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" support https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031 Fri Dec 23 12:12:53 GMT 2022 * Note: For 10/11 of the set, there was a follow up this Monday [PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html [As it relates to one patch in the series: '[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr' That's mine, needs to be updated (WIP) and fixes array descriptor/alloc-string-length var issues, where descriptor/string length may need to be handled explicitly on data entering map, i.e. string lengths/allocator may require 'to:' instead of 'alloc:' - and on data exit mapping, the current code might add a bogus 'alloc:'. - Idea is to handle this explicitly in fortran/trans-openmp.cc instead of auto-adding it in the ME. Status: WIP - removed in ME but not all cases are handled yet in FE.) Fortran deep mapping (allocatable components) (Old patch of March 2022, but first part now properly but belated submitted - today): [Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609637.html Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]
Rather obvious fix. Hence, I intent to commit it later as obvious, unless there are any comments. Tobias PS: Thanks goes to Thomas for finding + reporting the issue. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558] gcc/fortran/ChangeLog: PR fortran/108558 * trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr. libgomp/ChangeLog: PR fortran/108558 * testsuite/libgomp.fortran/has_device_addr.f90: New test. gcc/fortran/trans-openmp.cc| 2 + .../testsuite/libgomp.fortran/has_device_addr.f90 | 59 ++ 2 files changed, 61 insertions(+) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 87213de0918..5283d0ce5f3 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -6205,6 +6205,8 @@ gfc_split_omp_clauses (gfc_code *code, = code->ext.omp_clauses->lists[OMP_LIST_MAP]; clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_IS_DEVICE_PTR] = code->ext.omp_clauses->lists[OMP_LIST_IS_DEVICE_PTR]; + clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_HAS_DEVICE_ADDR] + = code->ext.omp_clauses->lists[OMP_LIST_HAS_DEVICE_ADDR]; clausesa[GFC_OMP_SPLIT_TARGET].device = code->ext.omp_clauses->device; clausesa[GFC_OMP_SPLIT_TARGET].thread_limit diff --git a/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 new file mode 100644 index 000..95cc7788f2d --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 @@ -0,0 +1,59 @@ +! { dg-additional-options "-fdump-tree-original" } + +! +! PR fortran/108558 +! + +! { dg-final { scan-tree-dump-times "#pragma omp target has_device_addr\\(x\\) has_device_addr\\(y\\)" 2 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\)" 2 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp target data use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp target update from\\(y\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\) use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp teams" 2 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp distribute" 2 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp parallel" 2 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp for nowait" 2 "original" } } + +module m +contains +subroutine vectorAdd(x, y, N) + implicit none + integer :: N + integer(4) :: x(N), y(N) + integer :: i + + !$omp target teams distribute parallel do has_device_addr(x, y) + do i = 1, N +y(i) = x(i) + y(i) + end do +end subroutine vectorAdd +end module m + +program main + use m + implicit none + integer, parameter :: N = 9876 + integer(4) :: x(N), y(N) + integer :: i + + x(:) = 1 + y(:) = 2 + + !$omp target data map(x, y) +!$omp target data use_device_addr(x, y) + call vectorAdd(x, y, N) +!$omp end target data +!$omp target update from(y) +if (any (y /= 3)) error stop + !$omp end target data + + x = 1 + y = 2 + !$omp target data map(x, y) use_device_addr(x, y) +!$omp target teams distribute parallel do has_device_addr(x, y) +do i = 1, N + y(i) = x(i) + y(i) +end do + !$omp end target data + if (any (y /= 3)) error stop +end program
[committed] gomp/declare-variant-1*.f90: Update for Windows
Tested on x86_64-gnu-linux with -m32 and -m64. It was discussed on #gfortran IRC and tested with MinGW64 with/by nightstrike. Committed to mainline. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit d1e0575fdc9216f96c4f88f9f41a25b854300c0b Author: Tobias Burnus Date: Fri Jan 27 09:13:16 2023 +0100 gomp/declare-variant-1*.f90: Update for Windows Replace target selector 'lp64' by '! ilp32' to handle Windows which uses 32bit long (and vice versa for '! lp64'). gcc/testsuite/ChangeLog: * gfortran.dg/gomp/declare-variant-10.f90: Update scan-tree's target selector to handle Windows. * gfortran.dg/gomp/declare-variant-11.f90: Likewise. * gfortran.dg/gomp/declare-variant-12.f90: Likewise. diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 index d6d2c8c262b..2f09146a10d 100644 --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 @@ -72,2 +72,2 @@ contains - call f04 () ! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } } - ! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ! lp64 } || { ! { i?86-*-* x86_64-*-* } } } } } } + call f04 () ! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } } + ! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ilp32 } || { ! { i?86-*-* x86_64-*-* } } } } } } diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 index 60aa0fcb3b0..3593c9a5bb3 100644 --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 @@ -129,2 +129,2 @@ contains -call f27 () ! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } } - ! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } +call f27 () ! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } } +! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } } diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 index 610693e9807..2fd8abd0dc7 100644 --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 @@ -136,2 +136,2 @@ contains - call f13 () ! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } } - ! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } + call f13 () ! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } } +! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } }
[Patch][v2] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]
Hi Jakub, hi all, updated patch included, i.e. avoiding 'count' for 'j' when a 'j.0' would do (i.e. only local var without the different step calculation). I also now reject if there is a non-unit step on the loop using an outer var. Eventually still to be done: replace the 'sorry' by working code, i.e. implement the suggestions to handle some/all non-unit iteration steps as proposed in this thread. On 20.01.23 18:39, Jakub Jelinek wrote: I think instead of non-unity etc. it is better to talk about constant step 1 or -1. I concur. The actual problem with non-simple loops for non-rectangular loops is both in case it is an inner loop which uses some outer loop's iterator, or if it is outer loop whose iterator is used, both of those cases will not be handled properly. I have now added a check for the other case as well. Just to confirm, the following is fine, isn't it? !$omp simd collapse(4) do i = 1, 10, 2 do outer_var = 1, 10 ! step = + 1 do j = 1, 10, 2 do inner_var = 1, outer_var ! step = 1 i.e. both the inner_var and outer_var have 'step = 1', even if other loops in the 'collapse' have step != 1. I think it should be fine. OK mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] This patch ensures that loop bounds depending on outer loop vars use the proper TREE_VEC format. It additionally gives a sorry if such an outer var has a non-one/non-minus-one increment as currently a count variable is used in this case (see PR). Finally, it avoids 'count' and just uses a local loop variable if the step increment is +/-1. PR fortran/107424 gcc/fortran/ChangeLog: * trans-openmp.cc (struct dovar_init_d): Add 'sym' and 'non_unit_incr' members. (gfc_nonrect_loop_expr): New. (gfc_trans_omp_do): Call it; use normal loop bounds for unit stride - and only create local loop var. libgomp/ChangeLog: * testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-3.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-4.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-5.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/privatization-1-compute-loop.f90: Update dg-note. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. gcc/fortran/trans-openmp.cc| 238 ++-- .../goacc/privatization-1-compute-loop.f90 | 6 +- .../goacc/privatization-1-routine_gang-loop.f90| 3 +- .../libgomp.fortran/non-rectangular-loop-1.f90 | 637 + .../libgomp.fortran/non-rectangular-loop-1a.f90| 374 .../libgomp.fortran/non-rectangular-loop-2.f90 | 243 .../libgomp.fortran/non-rectangular-loop-3.f90 | 186 ++ .../libgomp.fortran/non-rectangular-loop-4.f90 | 188 ++ .../libgomp.fortran/non-rectangular-loop-5.f90 | 28 + 9 files changed, 1854 insertions(+), 49 deletions(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 87213de0918..ccee9e16648 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -5116,10 +5116,135 @@ gfc_trans_omp_critical (gfc_code *code) } typedef struct dovar_init_d { + gfc_symbol *sym; tree var; tree init; + bool non_unit_iter; } dovar_init; +static bool +gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n, + gfc_code *code, gfc_expr *expr, vec *inits, + int simple, gfc_expr *curr_loop_var) +{ + int i; + for (i = 0; i < loop_n; i++) +{ + gcc_assert (code->ext.iterator->var->expr_type == EXPR_VARIABLE); + if (gfc_find_sym_in_expr (code->ext.iterator->var->symtree->n.sym, expr)) + break; + code = code->block->next; +} + if (i >= loop_n) +return false; + + /* Canonic format: TREE_VEC with [var, multiplier, offset]. */ + gfc_symbol *var = code->ext.iterator->var->symtree->n.sym; + + tree tree_var = NULL_TREE; + tree a1 = integer_one_node; + tree a2 = integer_zero_node; + + if (!simple) +{ + /* FIXME: Handle non-unit iter steps, cf. PR fortran/107424. */ + sorry_at (gfc_get_location (_loop_var->where), + "non-rectangular loop nest with step other than constant 1 " + "or -1 for %qs", curr_loop_var->symtree->n.sym->name); + return false; +} + + dovar_init *di; + unsigned ix; + FOR_EACH_VEC_ELT (*inits, ix, di) +if (di->sym == var && !di->non_unit_iter) + { + tree_var = di->init; + gcc_assert (DECL_P (tree_var)); + break; + } +else if (di->sym == var) + { + /* FIXME:
[Patch] OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]
I stumbled over a new FAIL (regression) in sollve_vv today, which was due to an odd corner case (see commit log for a description). The mentioned in-scan error is tested for in gomp/loop-2.f90 ("'inscan' REDUCTION clause on construct other than DO, SIMD, DO SIMD, PARALLEL DO, PARALLEL DO SIMD"). I hope that this patch covers all cases and no other surprises exist... OK for mainline? * * * The ICE is new in GCC 13 due to the duplicate diagnostic (cf. PR); the original issue existed before but seemingly did not affect the code, at least the sollve_vv testcase passed before. Still, it could be backported to GCC 12. (Fortran '!$omp loop' support was added with r12-1206.) Thoughts? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512] For 'parallel', loop-iteration variables are marked are marked as 'private', unless they either appear in an omp do/simd loop or an data-sharing clause already exists for those on 'parallel'. 'omp loop' wasn't handled, leading to (potentially) multiple data-sharing clauses in gfc_resolve_do_iterator as omp_current_ctx pointed to the 'parallel' directive, ignoring the in-betwen 'loop' directive. The latter lead to a bogus diagnostic - or rather an ICE as the source location var contained only '\0'. gcc/fortran/ChangeLog: PR fortran/108512 * openmp.cc (gfc_resolve_omp_do_blocks): Don't check 'inscan' restrictions for loop as rejected elsewhere. (gfc_resolve_do_iterator): Set a source location for added 'private'-clause arguments. * resolve.cc (gfc_resolve_code): Call gfc_resolve_omp_do_blocks also for EXEC_OMP_LOOP. gcc/testsuite/ChangeLog: PR fortran/108512 * gfortran.dg/gomp/loop-5.f90: New test. gcc/fortran/openmp.cc | 5 +- gcc/fortran/resolve.cc| 1 + gcc/testsuite/gfortran.dg/gomp/loop-5.f90 | 84 +++ 3 files changed, 89 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index cc1eab90b8c..7673a52249f 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -9056,7 +9056,9 @@ gfc_resolve_omp_do_blocks (gfc_code *code, gfc_namespace *ns) } if (i < omp_current_do_collapse || omp_current_do_collapse <= 0) omp_current_do_collapse = 1; - if (code->ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN]) + if (code->op == EXEC_OMP_LOOP) + ; /* Already rejected in resolve_omp_clauses. */ + else if (code->ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN]) { locus *loc = >ext.omp_clauses->lists[OMP_LIST_REDUCTION_INSCAN]->where; @@ -9224,6 +9226,7 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause) p = gfc_get_omp_namelist (); p->sym = sym; + p->where = omp_current_ctx->code->loc; p->next = omp_clauses->lists[OMP_LIST_PRIVATE]; omp_clauses->lists[OMP_LIST_PRIVATE] = p; } diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 94213cd3cd4..bd2a749776d 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -11950,6 +11950,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns) case EXEC_OMP_DISTRIBUTE_SIMD: case EXEC_OMP_DO: case EXEC_OMP_DO_SIMD: + case EXEC_OMP_LOOP: case EXEC_OMP_SIMD: case EXEC_OMP_TARGET_SIMD: gfc_resolve_omp_do_blocks (code, ns); diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-5.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-5.f90 new file mode 100644 index 000..1948e782653 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-5.f90 @@ -0,0 +1,84 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! PR fortran/108512 + +! The problem was that the context wasn't reset for the 'LOOP' +! such that the clauses of the loops weren't seen when adding +! PRIVATE clauses. +! +! In the following, only the loop variable of the non-OpenMP loop +! in 'subroutine four' should get a front-end addded PRIVATE clause + +implicit none +integer :: x, a(10), b(10), n +n = 10 +a = -42 +b = [(2*x, x=1,10)] + +! { dg-final { scan-tree-dump-times "#pragma omp target map\\(tofrom:a\\) map\\(tofrom:b\\) map\\(tofrom:x\\)\[\r\n\]" 1 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp parallel\[\r\n\]" 2 "original" } } +! ^- shows up twice; checked only here. +! { dg-final { scan-tree-dump-times "#pragma omp loop lastprivate\\(x\\)\[\r\n\]" 1 "original" } } + +!$omp target parallel map(tofrom: a, b, x) +!$omp loop lastprivate(x) +DO x = 1, n + a(x) = a(x) + b(x) +END DO +!$omp end loop +!$omp end target parallel +if (x /= 11) error stop +if (any (a /= [(2*x - 42, x=1,10)])) error stop +call two() +call three() +
Re: [Patch] install.texi: Bump newlib version for nvptx + gcn
On 22.01.23 02:45, Gerald Pfeifer wrote: Maybe, but the question is what to use? The project's webpage has on the first page: "patch submissions to Newlib" and "automate the testing of newlib". I also dug into the newlib web page and other sources and - while my personal preference slightly leans towards Newlib - believe newlib is more established overall. For the web pages, it's clearer than for our *.texi ones you dug into: ~/src/wwwdocs/htdocs> grep -r newlib . | wc -l 15 ~/src/wwwdocs/htdocs> grep -r Newlib . | wc -l 3 You need to be careful with those counts as there is not only 'the [nN]ewlib library' but also flags/configure arguments etc: gcc/doc/install.texi:@item --with-newlib gcc/doc/install.texi-@item --with-nds32-lib=@var{library} gcc/doc/install.texi:Currently, the valid @var{library} is @samp{newlib} or @samp{mculib}. gcc/doc/install.texi:to nvptx-newlib's @file{newlib} directory to the directory containing gcc/doc/install.texi:@option{--enable-newlib-io-long-long} options when configuring. gcc/doc/invoke.texi:@samp{--enable-newlib-nano-formatted-io}. gcc/doc/invoke.texi:@item -mnewlib gcc/doc/invoke.texi:@opindex mnewlib (and a few more). In the libstdc++-v3/doc/xml/, there are two 'newlib' and one 'Newlib' (plus a bunch of newlib as filename/argument/config option). Still, I concur that 'newlib' is still used a bit more often than 'Newlib'. * * * In any case, I concur that it would be nice to unify .texi/.xml and diagnostic output (twice in config/or1k/elf.opt) - and likewise the wwwdocs pages. (That elf.opt file has twice 'newlib' and once 'Newlib'.) -> adds this to the to-do list. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update
Now committed with the suggestions taken into account. That is: for non-rect loop-nest support, add 'some' / set back to partial. I also changed the already-in-GCC-11 wording as it was a bit unclear to which word/topic the "which" in the original patch referred to - and the "some" made it even worse. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit a18af43b161b6ff4ea6e3aaf08dd72cbacb53a89 Author: Tobias Burnus Date: Mon Jan 23 09:55:18 2023 +0100 OpenMP: Update gcc-13/changes + projects/gomp * htdocs/gcc-13/changes.html: Improve wording; mention nvptx reverse offload; add 'some' to Fortran non-rect-loop support. * htdocs/projects/gomp/index.html: Split clause/directive entry for 'allocate' and mark the clause variant as fully implemented. Set Fortran non-rect-loop support back to partial. --- htdocs/gcc-13/changes.html | 19 +-- htdocs/projects/gomp/index.html | 13 + 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index ba42170c..6cd5dd64 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -59,12 +59,19 @@ a work-in-progress. https://gcc.gnu.org/projects/gomp/;>OpenMP - Reverse offload is now supported and the all clauses to the - requires directive are now accepted. However, the - requires_offload, unified_address - and unified_shared_memory clauses imply the initial - device (= the host) as the only available device. Fortran now - supports non-rectangular loop nests, which were added for C/C++ in GCC 11. + Reverse offload is now supported with nvptx devices. Additionally, the + requires handling has been improved and all clauses are + now accepted. If a requirement cannot be fulfilled for an accessible + device, this device is excluded from the list of available devices. This + may imply that the only device left is the host (the initial device). + In particular, requires_offload is currently unsupported on + AMD GCN devices while unified_address and + unified_shared_memory are unsupported by all non-host + devices. + + + OpenMP 5.0: Fortran now supports some non-rectangular loop nests; for + C/C++, the support was added in GCC 11. The following OpenMP 5.1 features have been added: the diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 19ff3c7d..17cf1ad9 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -372,8 +372,8 @@ than listed, depending on resolved corner cases and optimizations. Non-rectangular loop nests -GCC11GCC13 -C/C++Fortran +GCC11GCC13 +C/C++ (full)Fortran (partial) Nested-parallel changes to max-active-levels-var ICV @@ -547,9 +547,14 @@ than listed, depending on resolved corner cases and optimizations. -align clause/modifier in allocate directive/clause and allocator directive +align clause in allocate directive +No + + + +align modifier in allocate clause GCC12 -C/C++ on clause only + thread_limit clause to target construct
[committed] libgomp.texi: Impl. status - non-rect loop nest only partial
As discussed in the thread Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610324.html in https://gcc.gnu.org/PR107424 and the thread starting at OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610240.html the Fortrans support is still incomplete. As suggested in the wwwdocs thread (see link), the implementation status for Fortran needs to be 'P'. (Short version of the issue: currently, there are many issues with non-rectangular loop nests; the nearly ready patches will fix those for stride == -1 and 1. Ideas exist for other strides, but this may take a few more days to get resolved.) Committed as r13-5287-g20552407ae11b61fccb46b3e96a8814e790254e7 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 20552407ae11b61fccb46b3e96a8814e790254e7 Author: Tobias Burnus Date: Mon Jan 23 09:40:41 2023 +0100 libgomp.texi: Impl. status - non-rect loop nest only partial libgomp/ * libgomp.texi (OpenMP 5.0): Set non-rectangular loop nest back to 'P' as Fortran support is incomplete. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 1267c2304a5..67a05111289 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -195,7 +195,7 @@ The OpenMP 4.5 specification is fully supported. @tab complete but no non-host devices provides @code{unified_address}, @code{unified_shared_memory} or @code{reverse_offload} @item @code{teams} construct outside an enclosing target region @tab Y @tab -@item Non-rectangular loop nests @tab Y @tab +@item Non-rectangular loop nests @tab P @tab Full support for C/C++, partial for Fortran @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab @item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop constructs @tab Y @tab
Re: [Patch] install.texi: Bump newlib version for nvptx + gcn
Hi Gerald, On 21.01.23 12:58, Gerald Pfeifer wrote: Is it maybe a little tough to bump the minimal requirement to something only released yesterday? Or is this not an issue looking at the use cases? (Genuine question. Maybe nothing to worry at all.) On the technical side, the newer newlib version is not yet required. But it looks as if it soon makes a lot of sense to have it: For the AMGCN stack builtins, they currently expand to the same registers and offset calculations as hard-coded in newlib (older version or if the builtin is not available). – If the stack allocation is changed to non-threadprivate, this will change the location. With the builtins, just recompiling newlib (+libgomp) will work (API preserved but not ABI). [Andrew to provide the stack patch; then me for the 2-line patch to enable OpenMP's reverse offload.] (Hen-egg problem in terms of compilation as newlib is compiled by GCC. Probably only detectable by running it on the offload device and checking whether it fails - not practical for a cross-compiler build.) For AMDGCN's vectorization functions: Those can lead to a significant performance advantage. I know that newlib only used some builtins if they are available. I think AMDGCN will emit code using the new libm functions, which in turn newlib only generates if GCC supports certain new builtins. (hen-egg problem, if my assumptions are correct.) [I think Kwok will provide this patch - he did implement the funcs in newlib.] nvptx: Thomas' patch for libgfortran(*) effectively requires the newer newlib - albeit one could imaging that there could be a configure check. [(*) "nvptx, libgfortran: Switch out of "minimal" mode", approved but awaiting approval of another patch)] Thus: As nvptx/amdgcn is (mostly) about offloading code, newlib is compiled usually alongside GCC (e.g. in SUSE, Debian/Ubuntu, ...); additionally, there is static linking such that mixing old vs. new libraries is less likely. Hence, requiring the newest version of newlib together with the newest compiler shouldn't be a problem in my opinion. And the if documented now, it cannot be forgotten by the time the pending patches get committed... ;-) And, this predates your patch, in one instance we refer to Newlib (upper case9, in the other to newlib (lower case). Would it make sense to converge to one? Maybe, but the question is what to use? The project's webpage has on the first page: "patch submissions to Newlib" and "automate the testing of newlib". As uppercase, we have: gcc/d/implement-d.texi:@code{CRuntime_Newlib} is set when Newlib is the default C library. gcc/doc/install.texi:Use Newlib (4.3.0 or newer). gcc/doc/invoke.texi:This option requires Newlib Nano IO, so GCC must be configured with gcc/doc/invoke.texi:Newlib. gcc/doc/invoke.texi:Specify the PRU MCU variant to use. Check Newlib for the exact list of gcc/doc/sourcebuild.texi:Target supports Newlib. gcc/doc/sourcebuild.texi:the code size of Newlib formatted I/O functions. gcc/po/gcc.pot:"Newlib Nano IO." (Add a missing "Requires " to complete the sentence.) and as lowercase: gcc/doc/install.texi:Specifies that @samp{newlib} is gcc/doc/install.texi:@samp{newlib}. gcc/doc/install.texi:RTEMS configurations, which currently use newlib. The option is denotes a configure argument.) gcc/doc/invoke.texi:newlib board library linking. The default is @code{or1ksim}. gcc/doc/invoke.texi:select linker and preprocessor options for use with newlib. gcc/doc/sourcebuild.texi:@item newlib (Side remark: While some @sample{newlib} in install.texi refer to a value to a configure argument, in the quote above it refers to the library itself.) gcc/po/gcc.pot:msgid "Configure the newlib board specific runtime. The default is or1ksim." gcc/po/gcc.pot:"This used to select linker and preprocessor options for use with newlib." libstdc++-v3/doc/xml/manual/configure.xml: vintage (2.3 and newer), 'gnu' is automatically selected. On newlib-based libstdc++-v3/doc/xml/manual/configure.xml: systems ('--with_newlib=yes') and OpenBSD, 'newlib' is libstdc++-v3/doc/xml/manual/evolution.xml:A new clocale model for newlib is available. Thoughts? Thanks for the comments! Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update
On 21.01.23 13:48, Gerald Pfeifer wrote: Just one question: Does "all clauses are now accepted" refer to - all (as in 100% of possible clauses), or - all (as in a special kind of clause)? The former – besides the listed 'unified_shared_memory', 'unified_address' and 'reverse_offload' clauses, there are 'dynamic_allocators' and 'atomic_default_mem_order' which are handled in the compiler front end (by being ignored (always fulfilled) and by used its argument as default value, respectively). Thanks for the review. (I will commit/update the patch later.) Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] install.texi: Bump newlib version for nvptx + gcn
A new newlib version has been realized yesterday: newlib-4.3.0 (yearly snapshot) https://sourceware.org/pipermail/newlib/2023/020141.html https://sourceware.org/ftp/newlib/index.html → 2023-01-20: newlib-4.3.0.20230120.tar.gz (8.8 MB) For both nvptx and GCN, the new version is recommended - mostly because of upcoming changes and not because GCC mainline already needs them currently. But soon it will, hence: The attached patch bumps the minimal version instead of keeping the old version and only recommending the newer one. Comments? Suggestions? – If there are none, I intent to commit the patch as obvious. Tobias PS: For AMDGCN, the newlib uses (if available) some new builtins: one provided by GCC 13 but having the currently same value as the hard coded registers that get used if the builtin s not available - to permit a change to non-private stack variables (required for reverse offload; will require recompilation of newlib). And to support vectorized math functions. (The gcn builtins still have to be added to GCC 13; if the builtins aren't available, newlib won't use them - hence, also will later require a rebuild with the newer newlib). For nvptx, newlib added some features to permit building a non-minimal version of libgfortran, which also permits I/O. The libgfortran changes have been approved but the GCC nvptx patches still have to be reviewed (and would also require a pending nvptx-tools pull request). BTW: The gcn vect math and the nvptx changes went into newlib in the last few days. Thus, if you have use the 'git' version it won't have the changes, unless you updated at least yesterday. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 install.texi: Bump newlib version for nvptx + gcn Before, newlib 3.2 was required for amdgcn and 3.1 for nvptx. Now recommended is 4.3.0 which was just released on 2023-01-20. While currently the old versions would work fine, upcoming GCC changes depend on a newer newlib. Thus, the minimal version is bumped instead of just recommending the new version. For GCN, the bump is in preparation for permitting non-threadlocal stack variables and vectorized math functions - both scheduled for GCC 13 and added to newlib in 4.3.0. For nvptx, this includes an emulated clock (commit 6bb96d13a), a calloc fix (5fca4e0f1) and changes to permit libgfortran to be compiled with I/O support instead of only in minimal mode. (Patch approved for GCC 13 but pending on a nvtpx patch.) gcc/ChangeLog: * doc/install.texi (amdgcn, nvptx): Require newlib 4.3.0. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index ccc8d15fd08..b1861a6a437 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3855,7 +3855,7 @@ Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and cop @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and @file{bin/amdgcn-amdhsa-ranlib}. -Use Newlib (3.2.0, or newer). +Use Newlib (4.3.0 or newer). To run the binaries, install the HSA Runtime from the @uref{https://rocm.github.io,,ROCm Platform}, and use @@ -4672,7 +4672,7 @@ Instead of GNU binutils, you will need to install Tell GCC where to find it: @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}. -You will need newlib 3.1.0 or later. It can be +You will need newlib 4.3.0 or later. It can be automatically built together with GCC@. For this, add a symbolic link to nvptx-newlib's @file{newlib} directory to the directory containing the GCC sources.
[Patch] OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]
This is all about non-rectangular loop nests in OpenMP. The attached patch depends on the obvious fix for https://gcc.gnu.org/PR108459, which is together with a nice testcase in Jakub's WIP patch attached to the PR; without, gfortran.dg/gomp/canonical-loop-1.f90 fails with an ICE (segfault). My patch fixes part of the Fortran issues found. Namely, it ensures that a "regular" non-rectangular loop nest actually works by passing the outer-loop-var, the multiplier and offset in a TREE_VEC to the middle end. It additionally avoids pointlessly creating a temporary variable for a VAR_DECL (main advantage: dump looks cleaner and avoids some dependency analysis) - and likewise for 'step' given that 'step' was evaluated before. There is an additional issue - not quite addressed in this patch: There are cases when a loop variable is replaced by another variable ('count') and then at the beginning of the loop body, the original variable gets the value from the count variable. Obviously, this no longer works with non-rectangular loop nests. The 'count' appears in two cases: (a) when the iteration step is not 1 or -1 and (b) if the iteration variable is a pointer (scalar with allocatable, pointer, optional argument or just a dummy argument; oddly, even if it has the value attribute). There is pending work to be done in this case, as mentioned in comment 6 and 8 of the PR. This patch adds some 'sorry' messages for them. I hope and think that I have not missed a case where 'count' is used which I did not catch, but I should have all or at least most. OK for mainline, once the other patch has been committed? Tobias PS: I still need to verify that everything is fine, once the other patch has been committed. A flaky mainboard on the laptop causes multiple random freezes per day, which makes testing + patch writing a bit harder. (At least the mainboard replacement is scheduled for tomorrow :-) ) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] This patch ensures that loop bounds depending on outer loop vars use the proper TREE_VEC format. It additionally gives a sorry if such an outer var has a non-one/non-minus-one increment as currently a count variable is used in this case (see PR). gcc/fortran/ChangeLog: PR fortran/107424 * trans-openmp.cc (gfc_nonrect_loop_expr): New. (gfc_trans_omp_do): Call it for start/end loop bound for non-rectangular loop nests. gcc/testsuite/ PR fortran/107424 * gfortran.dg/gomp/non-rectangular-loop-3.f90: New test. libgomp/ChangeLog: PR fortran/107424 * testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test. * testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test. gcc/fortran/trans-openmp.cc| 167 +- .../gfortran.dg/gomp/non-rectangular-loop-3.f90| 85 +++ .../libgomp.fortran/non-rectangular-loop-1.f90 | 637 + .../libgomp.fortran/non-rectangular-loop-1a.f90| 374 .../libgomp.fortran/non-rectangular-loop-2.f90 | 243 5 files changed, 1495 insertions(+), 11 deletions(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 87213de0918..73376894316 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -5120,6 +5120,136 @@ typedef struct dovar_init_d { tree init; } dovar_init; +static bool +gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n, + gfc_code *code, gfc_expr *expr, vec *inits) +{ + int i; + for (i = 0; i < loop_n; i++) +{ + gcc_assert (code->ext.iterator->var->expr_type == EXPR_VARIABLE); + if (gfc_find_sym_in_expr (code->ext.iterator->var->symtree->n.sym, expr)) + break; + code = code->block->next; +} + if (i >= loop_n) +return false; + + /* Canonic format: TREE_VEC with [var, multiplier, offset]. */ + gfc_symbol *var = code->ext.iterator->var->symtree->n.sym; + + gfc_se se; + tree tree_var, a1, a2; + a1 = integer_one_node; + a2 = integer_zero_node; + + gfc_init_se (, NULL); + gfc_conv_expr_lhs (, code->ext.iterator->var); + gfc_add_block_to_block (pblock, ); + tree_var = se.expr; + + { +/* FIXME: Handle non-unity iterations, cf. PR fortran/107424. + The issue is that for those a 'count' variable is used. */ +dovar_init *di; +unsigned ix; +tree t = tree_var; +while (TREE_CODE (t) == INDIRECT_REF) + t = TREE_OPERAND (t, 0); +FOR_EACH_VEC_ELT (*inits, ix, di) + { + tree t2 = di->var; + while (TREE_CODE (t2) == INDIRECT_REF) + t2 = TREE_OPERAND (t2, 0); + if (t == t2) + { + HOST_WIDE_INT intval; + if (gfc_extract_hwi
[Patch] libfortran: Fix execute_command_line for Windows
Reported by nightstrike, who also tested this patch. On Windows, we call system() which works as described at https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170 Namely, it only fails with "-1" if the command interpreter could not be started. Otherwise, it has the return value. (Same on Linux.) On POSIX systems, 'sh' calls exit(127) or _exit(127) if it cannot execute the program of the passed string, as documented. Cf. https://www.unix.com/man-page/posix/3p/system/ Thus, the question is what happens on Windows. Our experiments, several webpages (like stackoverflow) and the source code of WINE for cmd.exe indicate that Windows returns 9009 in that case. See for instance https://github.com/wine-mirror/wine/blob/master/programs/cmd/wcmdmain.c#L1262-L1269 Thus, we now do likewise. The code is for MINGW; Cygwin does not set that that var and is likely to use return values closer to POSIX. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libfortran: Fix execute_command_line for Windows On Windows, 'system' is called - that fails with -1 if the command interpreter could not be started; on POSIX systems, if the child process could not be started by the shell, exit(127)/_exit(127) is called/returned. On Windows, cmd.exe (and also the PowerShell) return errorlevel 9009. libgfortran/ChangeLog: * intrinsics/execute_command_line.c (execute_command_line): On Windows, regard system()'s return value of 9009 as EXEC_INVALIDCOMMAND. diff --git a/libgfortran/intrinsics/execute_command_line.c b/libgfortran/intrinsics/execute_command_line.c index 305f067d973..0d1688400c2 100644 --- a/libgfortran/intrinsics/execute_command_line.c +++ b/libgfortran/intrinsics/execute_command_line.c @@ -142,10 +142,15 @@ execute_command_line (const char *command, bool wait, int *exitstat, #endif else if (res == 127 || res == 126 #if defined(WEXITSTATUS) && defined(WIFEXITED) || (WIFEXITED(res) && WEXITSTATUS(res) == 127) || (WIFEXITED(res) && WEXITSTATUS(res) == 126) +#endif +#ifdef __MINGW32__ + /* cmd.exe sets the errorlevel to 9009, + if the command could not be executed. */ + || res == 9009 #endif ) /* Shell return codes 126 and 127 mean that the command line could not be executed for various reasons. */ set_cmdstat (cmdstat, EXEC_INVALIDCOMMAND);
Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update
Hi Gerald, On 16.01.23 23:16, Gerald Pfeifer wrote: On Mon, 16 Jan 2023, Tobias Burnus wrote: requires_offload, unified_address - and unified_shared_memory clauses cause that the - only available device is the initial device (the host). Fortran now + and unified_shared_memory clauses imply the initial + device (= the host) as the only available device. Fortran now I really stumble over the "as" – that sounds wrong and I fail to parse this part. I think it should be "is". happy to make this change. Or do you have an idea to reframe the sentence (or paragraph) altogether? Actually, I thinking about it again, the "imply" is also misleading – by itself the restrictions do not imply that accelerators/GPUs are not supported; that's only implied in GCC as the libgomp plugins for nvptx and amdgcn don't handle it, yet. How about the following? I put the other change into its own bullet point to be less confusing, completely rewording the remaining item and mention reverse offload support. (Reverse offload is: While being in a target region ('omp target', i.e. running code targeted for an offload device), it is possible to execute a code on the host. — If there is no available non-host device, the target region will run on the host (host fallback); in that case, reverse offload is trivial (as host code calls host code).) BTW: Before the release, further updates to changes.html are required. Keep them coming! :-) Actually, I think only one change was missing (looking at libgomp/libgomp.texi), unless some more pending patches are accepted. – I have now included that change in the attached patch. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Update gcc-13/changes + projects/gomp * htdocs/gcc-13/changes.html: Improve wording; mention nvptx reverse offload. * htdocs/projects/gomp/index.html: Split clause/directive entry for 'allocate' and mark the clause variant as fully implemented. htdocs/gcc-13/changes.html | 19 +-- htdocs/projects/gomp/index.html | 9 +++-- 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index ca9cd2da..6deb445f 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -53,12 +53,19 @@ a work-in-progress. https://gcc.gnu.org/projects/gomp/;>OpenMP - Reverse offload is now supported and the all clauses to the - requires directive are now accepted. However, the - requires_offload, unified_address - and unified_shared_memory clauses imply the initial - device (= the host) as the only available device. Fortran now - supports non-rectangular loop nests, which were added for C/C++ in GCC 11. + Reverse offload is now supported with nvptx devices. Additionally, the + requires handling has been improved and all clauses are + now accepted. If a requirement cannot be fulfilled for an accessible + device, this device is excluded from the list of available devices. This + may imply that the only device left is the host (the initial device). + In particular, requires_offload is currently unsupported on + AMD GCN devices while unified_address and + unified_shared_memory are unsupported by all non-host + devices. + + + OpenMP 5.0: Fortran now supports non-rectangular loop nests, which were + added for C/C++ in GCC 11. The following OpenMP 5.1 features have been added: the diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 19ff3c7d..dc9c88e7 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -547,9 +547,14 @@ than listed, depending on resolved corner cases and optimizations. -align clause/modifier in allocate directive/clause and allocator directive +align clause in allocate directive +No + + + +align modifier in allocate clause GCC12 -C/C++ on clause only + thread_limit clause to target construct
Re: [wwwdocs] gcc-13/changes.html + projects/gomp/: OpenMP update
Hi Gerald, On 14.01.23 22:47, Gerald Pfeifer wrote: I made a couple of incremental edits. See below for what I just pushed (and please speak up if you see any issues). commit 2f870cba58c81449beb618a9030824360a25 ... --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -54,10 +54,10 @@ a work-in-progress. ... + requires directive are now accepted. However, the requires_offload, unified_address - and unified_shared_memory clauses cause that the - only available device is the initial device (the host). Fortran now + and unified_shared_memory clauses imply the initial + device (= the host) as the only available device. Fortran now I really stumble over the "as" – that sounds wrong and I fail to parse this part. I think it should be "is". On the technical side, in principle, available devices are the host (aka "initial device") – and all installed** (nonhost) devices – in our case nvptx and (amd)gcn GPUs. However, when using 'requires', all installed devices which do not fulfill the requirement(s) are removed from the list of available devices. In case of 'dynamic_allocators', all devices support it, in case of 'reverse_offload' all installed amdgcn devices are filtered out and, for unified-shared memory,* neither nvptx nor amdgcn support it – and are removed from the list – such that at the end, only the host remains. (Hence, device code ('target regions') will run on the host → host fallback.) BTW: Before the release, further updates to changes.html are required. – For instance, as alluded in the previous paragraph, 'reverse offload' is (now) supported for nvptx. (But not yet with amdgcn.) Tobias (*) There is support for unified-shared memory for both nvptx and gcn, but the existing patches either have to be reviewed or to be revised. (**) I coined the term 'installed device'. OpenMP since TR11 contains some definitions for 'available devices' – which consists of the union of supported and accessible devices (possibly after sorting and further filtering). Namely: accessible devices – The host device and all non-host devices accessible for execution. supported devices – The host device and all non-host devices supported by the implementation for execution of target code for which the device-related requirements of the requires directive are fulfilled. The available-devices-var is in turn by default "*" – where "* expands to all accessible and supported devices". (The device list can be further filtered and sorted via the environment variable OMP_AVAILABLE_DEVICES.) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107706] (was: [PR107424])
First, I messed up the PR number – it should be PR107706. On 12.01.23 11:39, Jakub Jelinek wrote: On Thu, Jan 12, 2023 at 11:22:40AM +0100, Tobias Burnus wrote: Rather obvious fix for that ICE. Comments? If there are none, I will commit it later as obvious. I think the spec should be clarified, unlike clauses like if, novariants, nocontext, indirect, final clause operands where we specify the argument to be expression of logical type and glossary term says that OpenMP logical expression [...] But for the holds clause, all we say is that holds clause isn't inarguable and [...] that the listed expression evaluates to true in the assumption scope. [...] so I think making it clear that holds argument is expression of logical type would be useful. Actually, the spec does have (internally) hold-expr = "OpenMP logical expression" in a JSON file but that does not show up in the generated PDF. I have now filed an OpenMP spec issue for it (#3453). That said, the patch is ok, a rank > 1 expression can't be considered to evaluate to true... Thanks! Committed as r13-5118-g2ce55247a8bf32985a96ed63a7a92d36746723dc (with the fixed PR number). Thanks. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107424]
Rather obvious fix for that ICE. Comments? If there are none, I will commit it later as obvious. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/OpenMP: Reject non-scalar 'holds' expr in 'omp assume(s)' [PR107424] gcc/fortran/ChangeLog: PR fortran/107424 * openmp.cc (gfc_resolve_omp_assumptions): Reject nonscalars. gcc/testsuite/ChangeLog: PR fortran/107424 * gfortran.dg/gomp/assume-2.f90: Update dg-error. * gfortran.dg/gomp/assumes-2.f90: Likewise. * gfortran.dg/gomp/assume-5.f90: New test. gcc/fortran/openmp.cc| 8 +--- gcc/testsuite/gfortran.dg/gomp/assume-2.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/assume-5.f90 | 20 gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 | 2 +- 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index b71ee467c01..916daeb1aa5 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -6911,9 +6911,11 @@ void gfc_resolve_omp_assumptions (gfc_omp_assumptions *assume) { for (gfc_expr_list *el = assume->holds; el; el = el->next) -if (!gfc_resolve_expr (el->expr) || el->expr->ts.type != BT_LOGICAL) - gfc_error ("HOLDS expression at %L must be a logical expression", - >expr->where); +if (!gfc_resolve_expr (el->expr) + || el->expr->ts.type != BT_LOGICAL + || el->expr->rank != 0) + gfc_error ("HOLDS expression at %L must be a scalar logical expression", + >expr->where); } diff --git a/gcc/testsuite/gfortran.dg/gomp/assume-2.f90 b/gcc/testsuite/gfortran.dg/gomp/assume-2.f90 index ca3e04dfe95..dc306a9088a 100644 --- a/gcc/testsuite/gfortran.dg/gomp/assume-2.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/assume-2.f90 @@ -22,6 +22,6 @@ subroutine foo (i, a) end if ! !$omp end assume - silence: 'Unexpected !$OMP END ASSUME statement' - !$omp assume holds (1.0) ! { dg-error "HOLDS expression at .1. must be a logical expression" } + !$omp assume holds (1.0) ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" } !$omp end assume end diff --git a/gcc/testsuite/gfortran.dg/gomp/assume-5.f90 b/gcc/testsuite/gfortran.dg/gomp/assume-5.f90 new file mode 100644 index 000..5c6c00750dd --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/assume-5.f90 @@ -0,0 +1,20 @@ +! PR fortran/107424 +! +! Contributed by G. Steinmetz +! + +integer function f(i) + implicit none + !$omp assumes holds(i < g()) ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" } + integer, value :: i + + !$omp assume holds(i < g()) ! { dg-error "HOLDS expression at .1. must be a scalar logical expression" } + block + end block + f = 3 +contains + function g() + integer :: g(2) + g = 4 + end +end diff --git a/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 b/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 index 729c9737a1c..c8719a86a94 100644 --- a/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/assumes-2.f90 @@ -4,7 +4,7 @@ module m !$omp assumes contains(target) holds(x > 0.0) !$omp assumes absent(target) !$omp assumes holds(0.0) -! { dg-error "HOLDS expression at .1. must be a logical expression" "" { target *-*-* } .-1 } +! { dg-error "HOLDS expression at .1. must be a scalar logical expression" "" { target *-*-* } .-1 } end module module m2
Re: [PATCH] fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]
Hi, On 11.01.23 10:18, Jakub Jelinek via Gcc-patches wrote: As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the Fortran FE is wrong since r0-100026-gb64fca63690ad [...] I went through all other changes from that commit and found that __builtin_sincos{,f,l} got broken as well, [...] The following patch fixes that, plus some formatting issues around the spots I've changed. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks for the patch! Tobias 2023-01-11 Jakub Jelinek PR fortran/108349 * f95-lang.cc (gfc_init_builtin_function): Fix up function types for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}. Formatting fixes. --- gcc/fortran/f95-lang.cc.jj2022-11-15 22:57:18.247210671 +0100 +++ gcc/fortran/f95-lang.cc 2023-01-10 11:31:43.787266346 +0100 @@ -714,31 +714,34 @@ gfc_init_builtin_functions (void) float_type_node, NULL_TREE); func_cdouble_double = build_function_type_list (double_type_node, - complex_double_type_node, - NULL_TREE); + complex_double_type_node, + NULL_TREE); func_double_cdouble = build_function_type_list (complex_double_type_node, - double_type_node, NULL_TREE); + double_type_node, NULL_TREE); - func_clongdouble_longdouble = -build_function_type_list (long_double_type_node, - complex_long_double_type_node, NULL_TREE); - - func_longdouble_clongdouble = -build_function_type_list (complex_long_double_type_node, - long_double_type_node, NULL_TREE); + func_clongdouble_longdouble += build_function_type_list (long_double_type_node, + complex_long_double_type_node, NULL_TREE); + + func_longdouble_clongdouble += build_function_type_list (complex_long_double_type_node, + long_double_type_node, NULL_TREE); ptype = build_pointer_type (float_type_node); - func_float_floatp_floatp = -build_function_type_list (void_type_node, ptype, ptype, NULL_TREE); + func_float_floatp_floatp += build_function_type_list (void_type_node, float_type_node, ptype, ptype, + NULL_TREE); ptype = build_pointer_type (double_type_node); - func_double_doublep_doublep = -build_function_type_list (void_type_node, ptype, ptype, NULL_TREE); + func_double_doublep_doublep += build_function_type_list (void_type_node, double_type_node, ptype, + ptype, NULL_TREE); ptype = build_pointer_type (long_double_type_node); - func_longdouble_longdoublep_longdoublep = -build_function_type_list (void_type_node, ptype, ptype, NULL_TREE); + func_longdouble_longdoublep_longdoublep += build_function_type_list (void_type_node, long_double_type_node, ptype, + ptype, NULL_TREE); /* Non-math builtins are defined manually, so they're not included here. */ #define OTHER_BUILTIN(ID,NAME,TYPE,CONST) @@ -992,9 +995,8 @@ gfc_init_builtin_functions (void) "calloc", ATTR_NOTHROW_LEAF_MALLOC_LIST); DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_CALLOC)) = 1; - ftype = build_function_type_list (pvoid_type_node, -size_type_node, pvoid_type_node, -NULL_TREE); + ftype = build_function_type_list (pvoid_type_node, pvoid_type_node, + size_type_node, NULL_TREE); gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC, "realloc", ATTR_NOTHROW_LEAF_LIST); Jakub - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
OpenMP Patch Ping
Hi all, hello Jakub, Below is the updated list to last ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607178.html NOTE to the list below: I have stopped checking older patches. I know some more are pending review, others need to be revised. I will re-check, once the below listed patches have been reviewed. Cf. old list. Thanks for the reviews done in between the last ping and now! * * * Small patches = * [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html Tue Dec 13 16:38:22 GMT 2022 * [Patch] OpenMP: Parse align clause in allocate directive in C/C++ https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html Tue Dec 13 17:44:27 GMT 2022 * Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads) https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608245.html Thu Nov 24 12:01:04 GMT 2022 (Side note: wwwdocs also needs to be updated for the latter patch and some other patches done in the meanwhile.) Fortran allocat(e,ors) prep patch = * [Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive (was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive) https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html Wed Dec 21 15:51:25 GMT 2022 (Remark: While written from scratch, it is kind of a follow-up to Abid's patch [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0) you/Jakub reviewed on Tue Oct 11 12:13:14 GMT 2022, i.e. https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html - For the actual implementation of 'allocators', we still have to solve the issues raised in the review for '[PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).'. at https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603279.html (and earlier in the thread); implementing 'omp allocate' (Fortran/C/C++) seems to be easier but no one has started implementing it so far - only parsing support exists. - The USM patches on semi-USM system run into a similar issue as 'allocators' and for it, some ME omp_allocate is added.) Mapping related patches === (Complex but GCC needs a revision badly as it fixing several bugs and missing functionality.) * Complete patch set was just re-submitted by Julian, overiew patch is [PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" support https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031 Fri Dec 23 12:12:53 GMT 2022 * Note: For 10/11 of the set, there was a follow up this Monday [PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html [As it relates to one patch in the series: '[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr' That's mine, needs to be updated (WIP) and fixes array descriptor/alloc-string-length var issues, where descriptor/string length may need to be handled explicitly on data entering map, i.e. string lengths/allocator may require 'to:' instead of 'alloc:' - and on data exit mapping, the current code might add a bogus 'alloc:'. - Idea is to handle this explicitly in fortran/trans-openmp.cc instead of auto-adding it in the ME. Status: WIP - removed in ME but not all cases are handled yet in FE.) Fortran deep mapping (allocatable components) (Old patch of March 2022, but first part now properly but belated submitted - today): [Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609637.html Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT
This patches is the ME part to support OpenMP 5.0's deep-mapping feature, i.e. mapping allocatable components of Fortran's derived types automatically. [Not the lang hooks but allocatate-array part will probably also be useful when later adding 'iterator'-modifier support to the 'map'/'to'/'from' clauses.] This is a belated real submission of the patch sent in March 2022, https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html (with FE fixes at https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593562.html (note to self: Bernhard did sent some comment fixes off list) + https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593704.html ) + ME fix for OpenACC at https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603906.html [which is in the attach patch] As written, attached is the ME part. Below is a description how it is supposed to get used; the patch links above show how it looks in the real-code FE. == BACKGROUND == Fortran permits type t integer, allocatable :: x, y(:) end type t type t2 type(t2), allocatable :: previous_stack ! Not valid in OMP 5.0 integer, allocatable :: a type(t) :: b, c(:) end type t2 type(t2) :: var1, var2(:) !$omp target enter data(var1, var2) Where all allocatable components need to be mapped alongside. The number of mappings is only known at runtime, e.g. for 'var2' - the array size is only known at runtime and then each allocatable component of each element of 'var2' needs to be mapped - both those can contain allocatable components as well, which have to be mapped - but of course only if the parent component is actually allocated. * * * The current code puts 'kinds' with const values into an array, 'sizes' in a fixed-size stack array (either with const or dynamic values) and 'addrs' is a struct. To support deep mapping, those all have to be dynamic; hence, the arrays 'sizes' and 'kinds' are turned into pointers - and the 'struct' gets a tailing variable-size array, which is then filled with the dynamic content. For this purpose, three lang hooks are added - all are called rather late, i.e. during omp-low.cc, such that all previous operations (implicit mapping, explicit mapping, OpenMP mapper) are already done. * First one to check whether there is any allocatable component for a map-clause element (explicitly or implicitly added). If not, the current code is used. Otherwise, it uses dynamically allocated arrays (Side note: As the size is now only known at runtime, TREE_VEC has now another element - the array size - hence the change to expand_omp_target, before it was known statically from the type.) * Second hook to actually count how many allocations are done, required for the allocation. * Third hook to actually fill the arrays. Comments? Remarks? Tobias PS: There are two things to watch out in the future: - 'mapper': I think it should work when the mapper is present as it comes rather late in the flow, but I have not checked with Julian's patches (pending review). - Order: the dynamic items are added last to 'addrs' to permit keeping the 'struct' type. I think that's fine for allocatable components as they are added rather late and accessing them via 'is_device_ptr' is not possible. But there might be some issues with 'interator' in future; something to watch out. If so, we may need to partially or fully give up on putting all others mappings stillinto the struct. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT This patch adds middle end support for mapping Fortran derived-types with allocatable components. If those are present, the kinds/sizes arrays will be allocated at run time and the addrs struct gets an variable-sized array at the end. The newly added hooks are: * lhd_omp_deep_mapping_p: If true, use the new code. * lhd_omp_deep_mapping_cnt: Count the elements, needed for allocation. * lhd_omp_deep_mapping: Fill the allocated arrays. gcc/ChangeLog: * langhooks-def.h (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New. (LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT, LANG_HOOKS_OMP_DEEP_MAPPING): Define. (LANG_HOOKS_DECLS): Use it. * langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New stubs. * langhooks.h (struct lang_hooks_for_decls): Add new hooks * omp-expand.cc (expand_omp_target): Handle dynamic-size addr/sizes/kinds arrays. * omp-low.cc (build_sender_ref, fixup_child_record_type, scan_sharing_clauses, lower_omp_target): Update to handle new hooks and dynamic-size addr/sizes/kinds arrays. gcc/langhooks-def.h | 10 +++ gcc/langhooks.cc| 24 ++
Re: [PATCH] [OpenMP] GC unused SIMD clones
On 25.11.22 03:13, Sandra Loosemore wrote: This patch is a followup to my not-yet-reviewed patch [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target" That patch got reviewed and went into mainline on Nov 15, 2022 as https://gcc.gnu.org/r13-4309-g309e2d95e3b930c6f15c8a5346b913158404c76d In comments on a previous iteration of that patch, I was asked to do something to delete unused SIMD clones to avoid code bloat; this is it. I've implemented something like a simple mark-and-sweep algorithm. Clones that are used are marked at the point where the call is generated in the vectorizer. The loop that iterates over functions to apply the passes after IPA is modified to defer processing of unmarked clones, and anything left over is deleted. Jakub referred to Honza for the review, who wrote yesterday off list (to me and Sandra): I am really sorry for taking so long time. It was busy month for me and I was not very keen about the idea, since we had such logic implemented many years ago but removed it to be able to determine functions to be output early and optimize code layout. I see that this is not possible with current organization where vectorization is run late, so I guess it does make sense to do what you are doing. Patch is OK, Honza Thanks for the review! (And to Sandra: thanks for the patch.) I leave it to Sandra to commit her patch and only want to update the gcc-patches@ email. However. I think we can expect a commit tomorrow. (Today is a holiday at her place - as new year's day fell on a Sunday.) Thanks and happy new year! Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive (was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive)
Related pending (simple) patches - aka *Patch Ping*: * [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html * [Patch] OpenMP: Parse align clause in allocate directive in C/C++ https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html On 14.12.22 11:47, Tobias Burnus wrote: This patch adds parsing/argument-checking support for '!$omp allocators allocate([align(int),allocator(a) :] list)' This follow-up patch additionally adds parsing support for both declarative and allocate-stmt-associated '!$omp allocate' directives – and replaces my previous patch. OK for mainline? * * * The code requires in line with OpenMP 5.1 that an executable statement comes before an '!$omp allocate' that is associated with a Fortran ALLOCATE stmt, which is diagnosed. Note: There is a spec change/regression related to permitting structure elements; while OpenMP 5.0/5.1 did permit them in the allocate-stmt-associated "!$omp allocate", OpenMP 5.2 stopped doing – and '!$omp allocators' never permitted it. — For allocate that's seems to be the accidental result from "permitted unless stated otherwise" to "rejected unless stated otherwise". For 'allocators', it is the result of the original 'allocate' clause which should have been extended for 'allocators' - or should not. In any case, that's tracked now in OpenMP's spec issue #3437. Thoughts? – The code rejects var%comp and var(1)%comp etc. for now – besides the unclear spec status, I admittedly did this also to make checking easier (like for duplicated entries, entry same as in ALLOCATE except for tailing array spec etc.). * * * This patch replaced both my previous patch in this thread and also Abid's patch "[PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0)." https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html In his patch set, later patches actually add allocater support for allocatables/pointers, only – but there issues with regards to the used allocator (see patches + patch review). As my attached patch raises a sorry, it neither addresses that issue nor is it affected by that issue. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/OpenMP: Add parsing support for allocators/allocate directive gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Update allocator, fix align dump. (show_omp_node, show_code_node): Handle EXEC_OMP_ALLOCATE. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE and ..._EXEC. (enum gfc_exec_op): Add EXEC_OMP_ALLOCATE. (struct gfc_omp_namelist): Add 'allocator' to 'u2' union. (struct gfc_namespace): Add omp_allocate. (gfc_resolve_omp_allocate): New. * match.cc (gfc_free_omp_namelist): Free 'u2.allocator'. * match.h (gfc_match_omp_allocate, gfc_match_omp_allocators): New. * openmp.cc (gfc_omp_directives): Uncomment allocate/allocators. (gfc_match_omp_variable_list): Add bool arg for rejecting listening common-block vars separately. (gfc_match_omp_clauses): Update for u2.allocators. (OMP_ALLOCATORS_CLAUSES, gfc_match_omp_allocate, gfc_match_omp_allocators, is_predefined_allocator, gfc_resolve_omp_allocate): New. (resolve_omp_clauses): Update 'allocate' clause checks. (omp_code_to_statement, gfc_resolve_omp_directive): Handle OMP ALLOCATE/ALLOCATORS. * parse.cc (in_exec_part): New global var. (check_omp_allocate_stmt, parse_openmp_allocate_block): New. (decode_omp_directive, case_exec_markers, case_omp_decl, gfc_ascii_statement, parse_omp_structured_block): Handle OMP allocate/allocators. (verify_st_order, parse_executable): Set in_exec_part. * resolve.cc (gfc_resolve_blocks, resolve_codes): Handle allocate/allocators. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code):) Likewise. * trans-openmp.cc (gfc_trans_omp_directive): Likewise. (gfc_trans_omp_clauses, gfc_split_omp_clauses): Update for u2.allocator, fix for u.align. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-3.f90: Update dg-error. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-2.f90: Update dg-error. * gfortran.dg/gomp/allocate-4.f90: New test. * gfortran.dg/gomp/allocate-5.f90: New test. * gfortran.dg/gomp/allocate-6.f90: New test. * gfortran.dg/gomp/allocate-7.f90: New test. * gfortran.dg/gomp/allocators-1.f90: New test. * gfortran.dg/gomp/allocators-2.f90: New test. gcc/fortran/dump-parse-tree.cc | 8 +- gcc/fortran/gfortran.h | 9 +- gcc/fortran/match.cc | 7 +- gcc/fortran/match.h | 2 + gcc/fortran/openmp.cc| 328 +
Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows
On 19.12.22 11:51, Tobias Burnus wrote: On 19.12.22 10:26, Tobias Burnus wrote: And here is a more light-wight variant, suggested by Nightstrike: Using '.' instead of creating a new directory - and checking for __WIN32__ instead for __MINGW32__. [...] I have now updated the heavy version. The #if check moved to C as those macros aren't set in Fortran. (That's now https://gcc.gnu.org/PR108175 - I thought that there was a PR before, but I couldn't find any.) This variant has now been committed as https://gcc.gnu.org/r13-4818-g18fc70aa9c753d17c00211cea9fa5bd843fe94fd Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows
On 19.12.22 10:26, Tobias Burnus wrote: And here is a more light-wight variant, suggested by Nightstrike: Using '.' instead of creating a new directory - and checking for __WIN32__ instead for __MINGW32__. The only downside of this variant is that it does not check whether "close(10,status='delete')" will delete a directory without failing with an error. – If the latter makes sense, I think a follow-up check should be added to ensure the directory has indeed been removed by 'close'. I have now updated the heavy version. The #if check moved to C as those macros aren't set in Fortran. (That's now https://gcc.gnu.org/PR108175 - I thought that there was a PR before, but I couldn't find any.) Additionally, on Windows the '.' directory is now opened - avoiding issues with POSIX functions (and the requirement to use '#include ' etc.). - As OPEN already fails, there is no point in checking for the rest. On the non-Windows side, there is now a check that 'CLOSE' with status='delete' indeed has deleted the directory. Thoughts about which variant is better? Other suggestions or comments? ^- comments? PS: On my x86-64 Linux, OPEN works but READ fails with EISDIR/errno == 21. And thanks to Nightstrike for testing, suggestions and reporting the issue at the first place. On 19.12.22 10:09, Tobias Burnus wrote: As discussed in #gfortran IRC, on Windows opening a directory fails with EACCESS. (It works under Cygwin - nightstrike was so kind to test this.) Additionally, '[ -d dir ] || mkdir dir' is also not very portable. Hence, I use an auxiliary C file calling the POSIX functions and expect a fail for non-Cygwin windows. Comments? Suggestions? - If there aren't any, I plan to commit it as obvious tomorrow. I don't have a strong preference for the one-file/'.'/smaller solutions vs the two-file/mkdir/close-'delete' solution, but I am slightly inclined to the the one that tests more. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gfortran.dg/read_dir.f90: Make PASS on Windows On non-Cygwin Windows, use '.' and expect the documented fail when opening a directory (EACCESS). As gfortran does not set __WIN32__ this check is done on the C side. (On __CYGWIN__, __WIN32__ is not set - but to make it clear, !__CYGWIN__ is used in #if.) On non-Windows, replace the 'call system' shell call by the POSIX functions stat/mkdir/rmdir for better compatibility, especially on embedded systems; additionally add some more checks. In particular, confirm that 'close' with status='delete' indeed deleted the directory. gcc/testsuite/ChangeLog: * gfortran.dg/read_dir-aux.c: New; provides my_mkdir, my_rmdir, my_verify_not_exists and expect_open_to_fail. * gfortran.dg/read_dir.f90: Call those; expect that opening a directory fails on Windows. gcc/testsuite/gfortran.dg/read_dir-aux.c | 68 gcc/testsuite/gfortran.dg/read_dir.f90 | 54 ++--- 2 files changed, 117 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/read_dir-aux.c b/gcc/testsuite/gfortran.dg/read_dir-aux.c new file mode 100644 index 000..307b44472af --- /dev/null +++ b/gcc/testsuite/gfortran.dg/read_dir-aux.c @@ -0,0 +1,68 @@ +#if defined(__WIN32__) && !defined(__CYGWIN__) + /* Mostly skip on Windows, cf. main file why. */ + +int expect_open_to_fail () { return 1; } + +void my_verify_not_exists (const char *dir) { } +void my_mkdir (const char *dir) { } +void my_rmdir (const char *dir) { } + +#else + +#include /* For mkdir + permission bits. */ +#include /* For rmdir. */ +#include /* For errno. */ +#include /* For perror. */ +#include /* For abort. */ + + +int expect_open_to_fail () { return 0; } + +void +my_verify_not_exists (const char *dir) +{ + struct stat path_stat; + int err = stat (dir, _stat); + if (err && errno == ENOENT) +return; /* OK */ + if (err) +perror ("my_verify_not_exists"); + else +printf ("my_verify_not_exists: pathname %s still exists\n", dir); + abort (); + } + +void +my_mkdir (const char *dir) +{ + int err; + struct stat path_stat; + + /* Check whether 'dir' exists and is a directory. */ + err = stat (dir, _stat); + if (err && errno != ENOENT) +{ + perror ("my_mkdir: failed to call stat for directory"); + abort (); +} + if (err == 0 && !S_ISDIR (path_stat.st_mode)) +{ + printf ("my_mkdir: pathname %s is not a directory\n", dir); + abort (); +} + + err = mkdir (dir, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH); + if (err != 0) +{ + perror ("my_mkdir: failed to create directory"); + abort (); +} +} + +void +my_rmdir (const char *dir) +{ +
[Patch] gfortran.dg/read_dir.f90: Make PASS on Windows
As discussed in #gfortran IRC, on Windows opening a directory fails with EACCESS. (It works under Cygwin - nightstrike was so kind to test this.) Additionally, '[ -d dir ] || mkdir dir' is also not very portable. Hence, I use an auxiliary C file calling the POSIX functions and expect a fail for non-Cygwin windows. Comments? Suggestions? - If there aren't any, I plan to commit it as obvious tomorrow. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gfortran.dg/read_dir.f90: Make PASS on Windows Call POSIX's stat/mkdir/rmdir instead of using the shell via 'call system'. Additionally, expect EACCESS on non-Cygwin Windows as documented for trying to open a directory. gcc/testsuite/ChangeLog: * gfortran.dg/read_dir-aux.c: New; provides my_mkdir and my_rmdir. * gfortran.dg/read_dir.f90: Call my_mkdir/my_rmdir; expect error on Windows when opening a directory. gcc/testsuite/gfortran.dg/read_dir-aux.c | 39 + gcc/testsuite/gfortran.dg/read_dir.f90 | 43 2 files changed, 77 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/read_dir-aux.c b/gcc/testsuite/gfortran.dg/read_dir-aux.c new file mode 100644 index 000..e8404478517 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/read_dir-aux.c @@ -0,0 +1,39 @@ +#include /* For mkdir + permission bits. */ +#include /* For rmdir. */ +#include /* For errno. */ +#include /* For perror. */ +#include /* For abort. */ + + +void +my_mkdir (const char *dir) +{ + int err; + struct stat path_stat; + + /* Check whether 'dir' exists and is a directory. */ + err = stat (dir, _stat); + if (err && errno != ENOENT) +{ + perror ("my_mkdir: failed to call stat for directory"); + abort (); +} + if (err == 0 && !S_ISDIR (path_stat.st_mode)) +{ + printf ("my_mkdir: pathname %s is not a directory\n", dir); + abort (); +} + + err = mkdir (dir, S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH); + if (err != 0) +{ + perror ("my_mkdir: failed to create directory"); + abort (); +} +} + +void +my_rmdir (const char *dir) +{ + rmdir (dir); +} diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 b/gcc/testsuite/gfortran.dg/read_dir.f90 index c7ddc51fb90..3a8ff6adbc7 100644 --- a/gcc/testsuite/gfortran.dg/read_dir.f90 +++ b/gcc/testsuite/gfortran.dg/read_dir.f90 @@ -1,18 +1,51 @@ ! { dg-do run } +! { dg-additional-options "-cpp" } +! { dg-additional-sources read_dir-aux.c } +! ! PR67367 + program bug + use iso_c_binding implicit none + + interface + subroutine my_mkdir(s) bind(C) + ! Call POSIX's mkdir - and ignore fails due to + ! existing directories but fail otherwise + import + character(len=1,kind=c_char) :: s(*) + end subroutine + subroutine my_rmdir(s) bind(C) + ! Call POSIX's rmdir - and ignore fails + import + character(len=1,kind=c_char) :: s(*) + end subroutine + end interface + + character(len=*), parameter :: sdir = "junko.dir" + character(len=*,kind=c_char), parameter :: c_sdir = sdir // c_null_char + character(len=1) :: c - character(len=256) :: message integer ios - call system('[ -d junko.dir ] || mkdir junko.dir') - open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream') + + call my_mkdir(c_sdir) + open(unit=10, file=sdir,iostat=ios,action='read',access='stream') + +#if defined(__MINGW32__) + ! Windows is documented to fail with EACCESS when trying to open a directory + ! Note: Testing showed that __CYGWIN__ does permit opening directories + call my_rmdir(c_sdir) + if (ios == 0) & + stop 3 ! Expected EACCESS + stop 0 ! OK +#endif + if (ios.ne.0) then - call system('rmdir junko.dir') + call my_rmdir(c_sdir) STOP 1 end if read(10, iostat=ios) c - if (ios.ne.21.and.ios.ne.0) then + if (ios.ne.21.and.ios.ne.0) then ! EISDIR has often the value 21 close(10, status='delete') STOP 2 end if
Re: [Patch] gfortran.dg/read_dir.f90: Make PASS on Windows
And here is a more light-wight variant, suggested by Nightstrike: Using '.' instead of creating a new directory - and checking for __WIN32__ instead for __MINGW32__. The only downside of this variant is that it does not check whether "close(10,status='delete')" will delete a directory without failing with an error. – If the latter makes sense, I think a follow-up check should be added to ensure the directory has indeed been removed by 'close'. Thoughts about which variant is better? Other suggestions or comments? Tobias PS: On my x86-64 Linux, OPEN works but READ fails with EISDIR/errno == 21. On 19.12.22 10:09, Tobias Burnus wrote: As discussed in #gfortran IRC, on Windows opening a directory fails with EACCESS. (It works under Cygwin - nightstrike was so kind to test this.) Additionally, '[ -d dir ] || mkdir dir' is also not very portable. Hence, I use an auxiliary C file calling the POSIX functions and expect a fail for non-Cygwin windows. Comments? Suggestions? - If there aren't any, I plan to commit it as obvious tomorrow. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gfortran.dg/read_dir.f90: Make PASS on Windows Avoid call to the shell using POSIX syntax and use '.' instead. Additionally, expect fail on non-Cygwin Windows as opening a directory is documented to fail with EACCESS. gcc/testsuite/ChangeLog: * gfortran.dg/read_dir.f90: Open '.' instead of a freshly created directory; expect error on Windows when opening a directory. gcc/testsuite/gfortran.dg/read_dir.f90 | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 b/gcc/testsuite/gfortran.dg/read_dir.f90 index c7ddc51fb90..c91d0f78413 100644 --- a/gcc/testsuite/gfortran.dg/read_dir.f90 +++ b/gcc/testsuite/gfortran.dg/read_dir.f90 @@ -1,20 +1,27 @@ ! { dg-do run } +! { dg-additional-options "-cpp" } +! ! PR67367 + program bug implicit none character(len=1) :: c - character(len=256) :: message integer ios - call system('[ -d junko.dir ] || mkdir junko.dir') - open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream') + open(unit=10, file='.',iostat=ios,action='read',access='stream') + +#if defined(__WIN32__) && !defined(__CYGWIN__) + ! Windows is documented to fail with EACCESS when trying to open a directory + if (ios == 0) & + stop 3 ! Expected EACCESS + stop 0 ! OK +#endif + if (ios.ne.0) then - call system('rmdir junko.dir') STOP 1 end if read(10, iostat=ios) c - if (ios.ne.21.and.ios.ne.0) then - close(10, status='delete') + close(10) + if (ios.ne.21.and.ios.ne.0) then ! EISDIR has often the value 21 STOP 2 end if - close(10, status='delete') end program bug
[Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]
Seems to be a CUDA JIT issue - which is fixed by adding a dummy procedure. Lightly tested with 4 systems at hand, where 2 failed before. One had 10.2 and the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version and requires -mptx=3.1. (I did check that offloading indeed happened and no hostfallback was done.) OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098] Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global variables by NULL if a translation does not contain any executable code. It works with CUDA 11.1. The code of this commit is about reverse offload; having NULL values disables the side of reverse offload during image load. Solution is the same as found by Thomas for a related issue: Adding a dummy procedure. Cf. the PR of this issue and Thomas' patch "nvptx: Support global constructors/destructors via 'collect2'" https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html As that approach also works here: Co-authored-by: Thomas Schwinge gcc/ PR libgomp/108098 * config/nvptx/mkoffload.cc (process): Emit dummy procedure alongside reverse-offload function table to prevent NULL values of the function addresses. --- gcc/config/nvptx/mkoffload.cc | 14 ++ 1 file changed, 14 insertions(+) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 5d89ba8..8306aa0 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires) fputc (sm_ver2[i], out); fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n"); + /* WORKAROUND - see PR 108098 + It seems as if older CUDA JIT compiler optimizes the function pointers + in offload_func_table to NULL, which can be prevented by adding a + dummy procedure. With CUDA 11.1, it seems to work fine without + workaround while CUDA 10.2 as some ancient version have need the + workaround. Assuming CUDA 11.0 fixes it, emitting it could be + restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and + PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and + PTX ISA 7.1. */ + fprintf (out, "\n\t\".func __dummy$func ( );\"\n"); + fprintf (out, "\t\".func __dummy$func ( )\"\n"); + fprintf (out, "\t\"{\"\n"); + fprintf (out, "\t\"}\"\n"); + size_t fidx = 0; for (id = func_ids; id; id = id->next) {
[Patch] gcc-changelog: Add warning for auto-added files
_level_prs = [] @@ -706,6 +707,7 @@ class GitCommit: msg += f' (did you mean "{candidates[0]}"?)' details = '\n'.join(difflib.Differ().compare([file], [candidates[0]])).rstrip() self.errors.append(Error(msg, file, details)) +auto_add_warnings = {} for file in sorted(changed_files - mentioned_files): if not self.in_ignored_location(file): if file in self.new_files: @@ -738,6 +740,10 @@ class GitCommit: file = file[len(entry.folder):].lstrip('/') entry.lines.append('\t* %s: New file.' % file) entry.files.append(file) +if entry.folder not in auto_add_warnings: +auto_add_warnings[entry.folder] = [file] +else: +auto_add_warnings[entry.folder].append(file) else: msg = 'new file in the top-level folder not mentioned in a ChangeLog' self.errors.append(Error(msg, file)) @@ -755,6 +761,13 @@ class GitCommit: if pattern not in used_patterns: error = "pattern doesn't match any changed files" self.errors.append(Error(error, pattern)) +for entry, val in auto_add_warnings.items(): +if len(val) == 1: +self.warnings.append('Auto-added new file \'%s/%s\'' + % (entry, val[0])) +else: +self.warnings.append('Auto-added %d new files in \'%s\'' + % (len(val), entry)) def check_for_correct_changelog(self): for entry in self.changelog_entries: @@ -830,6 +843,12 @@ class GitCommit: for error in self.errors: print(error) +def print_warnings(self): +if self.warnings: +print('Warnings:') +for warning in self.warnings: +print(warning) + def check_commit_email(self): # Parse 'Martin Liska ' email = self.info.author.split(' ')[-1].strip('<>') diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py index f3773f178ea..5468efcd0d5 100755 --- a/contrib/gcc-changelog/git_email.py +++ b/contrib/gcc-changelog/git_email.py @@ -119,11 +119,13 @@ if __name__ == '__main__': success = 0 for full in sorted(allfiles): -email = GitEmail(full, False) +email = GitEmail(full) print(email.filename) if email.success: success += 1 print(' OK') +for warning in email.warnings: +print(' WARN: %s' % warning) else: for error in email.errors: print(' ERR: %s' % error) @@ -135,6 +137,7 @@ if __name__ == '__main__': if email.success: print('OK') email.print_output() +email.print_warnings() else: if not email.info.lines: print('Error: patch contains no parsed lines', file=sys.stderr) diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py index 89960d307c9..79f8e0b8604 100755 --- a/contrib/gcc-changelog/test_email.py +++ b/contrib/gcc-changelog/test_email.py @@ -461,3 +461,17 @@ class TestGccChangelog(unittest.TestCase): def test_CR_in_patch(self): email = self.from_patch_glob('0001-Add-M-character.patch') assert (email.errors[0].message == 'cannot find a ChangeLog location in message') + +def test_auto_add_file_1(self): +email = self.from_patch_glob('0001-Auto-Add-File.patch') +assert not email.errors +assert (len(email.warnings) == 1) +assert (email.warnings[0] +== "Auto-added new file 'libgomp/testsuite/libgomp.fortran/allocate-4.f90'") + +def test_auto_add_file_2(self): +email = self.from_patch_glob('0002-Auto-Add-File.patch') +assert not email.errors +assert (len(email.warnings) == 2) +assert (email.warnings[0] == "Auto-added new file 'gcc/doc/gm2.texi'") +assert (email.warnings[1] == "Auto-added 2 new files in 'gcc/m2'") diff --git a/contrib/gcc-changelog/test_patches.txt b/contrib/gcc-changelog/test_patches.txt index c378c32423a..6004608a8f9 100644 --- a/contrib/gcc-changelog/test_patches.txt +++ b/contrib/gcc-changelog/test_patches.txt @@ -3636,3 +3636,99 @@ index 000..d75da75 -- 2.38.1 +=== 0001-Auto-Add-File.patch +From e205ec03f0794aeac3e8a89e947c12624d5a274e Mon Sep 17 00:00:00 2001 +From: Tobias Burnus +Date: Thu, 15 Dec 2022 12:25:07 +0100 +Subject: [PATCH] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for + backward-only code [PR108056] + +libgfortran/ChangeLog: + + PR libfortran/108056 + * runtime/I
[Patch] gcc-changelog/git_email.py: Support older unidiff.PatchSet
Another backward compatibility issue - failed here on Ubuntu 20.04 which is old but not ancient. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcc-changelog/git_email.py: Support older unidiff.PatchSet Commit "unidiff: use newline='\n' argument", r13-4603-gb045179973161115c7ea029b2788f5156fc55cda, added support CR on a line, but that broke support for older unidiff.PatchSet. This patch uses a fallback for git_email.py (drop argument) if not available (TypeError exception) but keeps using it in test_email.py unconditionally. contrib/ChangeLog: * gcc-changelog/git_email.py (GitEmail:__init__): Support older unidiff.PatchSet that do not have a newline= argument of from_filename. diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py index ef50ebfb7fd..093c887ba4c 100755 --- a/contrib/gcc-changelog/git_email.py +++ b/contrib/gcc-changelog/git_email.py @@ -39,7 +39,11 @@ unidiff_supports_renaming = hasattr(PatchedFile(), 'is_rename') class GitEmail(GitCommit): def __init__(self, filename): self.filename = filename -diff = PatchSet.from_filename(filename, newline='\n') +try: + diff = PatchSet.from_filename(filename, newline='\n') +except TypeError: + # Older versions don't have the newline argument + diff = PatchSet.from_filename(filename) date = None author = None subject = ''
Re: [Patch] libgomp: Handle OpenMP's reverse offloads
Hi, On 15.12.22 20:42, Tobias Burnus wrote: If the libgomp plugin doesn't request special 'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard 'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use 'gomp_device_copy', which expects the device to be locked. (As can be told by the unconditional 'gomp_mutex_unlock (>lock);' before 'gomp_fatal'.) However, in a number of the 'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev', the device definitely is not locked; see Actually, reading it + the source code again, I think it makes sense to return a boolean – similar to devicep->host2dev_func and devicep->dev2host_func — and possibly wrap it into some convenience function, similar to gomp_device_copy – at least a bare exit() without further diagnostic does not seem to userfriendly. BTW: In line with the other code, you could use CUDA_CALL instead of CUDA_CALL_ERET; the fomer already calls the latter with 'false' as first argument + is used elsewhere. Regarding the lock: It seems the problem is the copying of devaddrs/sizes/kinds; this does not need any lock as the stack variables are on the device and only used for this reverse offload. Thus, there is no need for a lock as there are no races. However, as the existing gomp_copy_dev2host removes the lock, we could simply keep this lock – and probably should move it down to just before the user-function call – removing all (non-error) locks and unlocks on the way. — I mean something like the attached patch. Finally, I think we need to find a solution for the issue Andrew tried to address. — The current code invokes CUDA_CALL_ASSERT – which calls GOMP_PLUGIN_fatal. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/libgomp/target.c b/libgomp/target.c index e38cc3b6f1c..4b7233307cd 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -3319,5 +3319,6 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, gomp_mutex_lock (>lock); n = gomp_map_lookup_rev (>mem_map_rev, ); - gomp_mutex_unlock (>lock); + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) +gomp_mutex_unlock (>lock); if (n == NULL) @@ -3409,5 +3410,4 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, cdata = gomp_alloca (sizeof (*cdata) * mapnum); memset (cdata, '\0', sizeof (*cdata) * mapnum); - gomp_mutex_lock (>lock); for (uint64_t i = 0; i < mapnum; i++) { @@ -3643,4 +3643,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, uint64_t struct_cpy = 0; bool clean_struct = false; + gomp_mutex_lock (>lock); for (uint64_t i = 0; i < mapnum; i++) { @@ -3695,5 +3696,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, gomp_aligned_free ((void *) (uintptr_t) devaddrs[i]); } - + gomp_mutex_unlock (>lock); free (devaddrs); free (sizes);
Re: [Patch] libgomp: Handle OpenMP's reverse offloads
Hi, I have not fully tried to understand it, yet. (A) Regarding the issue of stalling, see als Andrew's patch and the discussion about it in "[PATCH] libgomp: fix hang on fatal error", https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html and in particular Jakub's two replies. (b) I think you want to remove this: On 15.12.22 18:34, Thomas Schwinge wrote: --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1,3 +1,5 @@ +#pragma GCC optimize "O0" + /* Plugin for NVPTX execution. (c) If the libgomp plugin doesn't request special 'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard 'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use 'gomp_device_copy', which expects the device to be locked. (As can be told by the unconditional 'gomp_mutex_unlock (>lock);' before 'gomp_fatal'.) However, in a number of the 'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev', the device definitely is not locked; see the calls adjacent to the TODO The question is what unlocks the device – it is surely locked in gomp_target_rev by: if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)) ... gomp_mutex_lock (>lock); for (uint64_t i = 0; i < mapnum; i++) ... } gomp_mutex_unlock (>lock); } Except for code like: gomp_mutex_unlock (>lock); gomp_fatal ("gomp_target_rev unhandled kind 0x%.4x", kinds[i]); The only functions that know about the pointer and get called are those behind the dev_to_host_cpy and host_to_dev_cpy - thus, they seemingly mess about with the unlocking?!? * * * Regarding your patch, I do not understand why you call twice unlock and have trice TODO unlock; that does not seem to make any sense. I think it is worthwhile to understand why plugin-nvptx.c unlocks the lock in the non-error case - as you observe that it is not locked in the error case. Additionally, it seems to make more sense to look into a revised patch of Andrew's patch, your patch looks like a rather bad band aid. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] Fortran/OpenMP: Add parsing support for allocators directive
This patch adds parsing/argument-checking support for '!$omp allocators allocate([align(int),allocator(a) :] list)' This is kind of logical follow-up and prep patch for the '!$omp allocate(list) [align(v) allocator(a)]' support that was submitted as part of a larger patchset by Abid; cf. review at "[PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0)." https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html My follow-up patch will add parsing support for declarative/executable '!$omp allocate'. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/OpenMP: Add parsing support for allocators directive gcc/fortran/ChangeLog: * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATORS and ST_OMP_END_ALLOCATORS. (enum gfc_exec_op): Add EXEC_OMP_ALLOCATORS. * dump-parse-tree.cc (show_omp_node, show_code_node): Handle OpenMP's ALLOCATORS directive. * match.h (gfc_match_omp_allocators): New prototype. * openmp.cc (OMP_ALLOCATORS_CLAUSES): Define. (gfc_match_omp_allocators): New. (resolve_omp_clauses, omp_code_to_statement, gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATORS. * parse.cc (parse_openmp_allocate_block): New. (case_exec_markers): Add ST_OMP_ALLOCATORS. (decode_omp_directive, gfc_ascii_statement, parse_executable): Parse OpenMP allocators directive. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_ALLOCATORS. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise. * trans-openmp.cc (gfc_trans_omp_directive): Show 'sorry' for EXEC_OMP_ALLOCATORS. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocators-1.f90: New test. * gfortran.dg/gomp/allocators-2.f90: New test. gcc/fortran/dump-parse-tree.cc | 2 + gcc/fortran/gfortran.h | 3 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 31 ++- gcc/fortran/parse.cc| 50 - gcc/fortran/resolve.cc | 2 + gcc/fortran/st.cc | 1 + gcc/fortran/trans-openmp.cc | 3 ++ gcc/fortran/trans.cc| 1 + gcc/testsuite/gfortran.dg/gomp/allocators-1.f90 | 28 ++ gcc/testsuite/gfortran.dg/gomp/allocators-2.f90 | 22 +++ 11 files changed, 140 insertions(+), 4 deletions(-) diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 5ae72dc1cac..4565b71c758 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -2081,6 +2081,7 @@ show_omp_node (int level, gfc_code *c) case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break; case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break; case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break; +case EXEC_OMP_ALLOCATORS: name = "ALLOCATORS"; break; case EXEC_OMP_ASSUME: name = "ASSUME"; break; case EXEC_OMP_ATOMIC: name = "ATOMIC"; break; case EXEC_OMP_BARRIER: name = "BARRIER"; break; @@ -3409,6 +3410,7 @@ show_code_node (int level, gfc_code *c) case EXEC_OACC_CACHE: case EXEC_OACC_ENTER_DATA: case EXEC_OACC_EXIT_DATA: +case EXEC_OMP_ALLOCATORS: case EXEC_OMP_ASSUME: case EXEC_OMP_ATOMIC: case EXEC_OMP_CANCEL: diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 5f8a81ae4a1..63f38d2 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -318,6 +318,7 @@ enum gfc_statement ST_OMP_END_MASKED_TASKLOOP, ST_OMP_MASKED_TASKLOOP_SIMD, ST_OMP_END_MASKED_TASKLOOP_SIMD, ST_OMP_SCOPE, ST_OMP_END_SCOPE, ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES, + ST_OMP_ALLOCATORS, ST_OMP_END_ALLOCATORS, /* Note: gfc_match_omp_nothing returns ST_NONE. */ ST_OMP_NOTHING, ST_NONE }; @@ -2959,7 +2960,7 @@ enum gfc_exec_op EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED, EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIMD, EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE, - EXEC_OMP_ERROR + EXEC_OMP_ERROR, EXEC_OMP_ALLOCATORS }; typedef struct gfc_code diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 2a805815d9c..b1f5db80125 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -149,6 +149,7 @@ match gfc_match_oacc_routine (void); /* OpenMP directive matchers. */ match gfc_match_omp_eos_error (void); +match gfc_match_omp_allocators (void); match gfc_match_omp_assume (void); match gfc_match_omp_assumes (void); match gfc_match_omp_atomic (void); diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index
[Patch] mklog: only do is_binary_file check if available
Ubuntu 20.04.5 LTS (focal) unfortunately has an too old unidiff.PatchSet for the feature added on Monday. Solution: use is_binary_file only when it is available. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 #!/usr/bin/env python3 # Copyright (C) 2020-2022 Free Software Foundation, Inc. # # This file is part of GCC. # # GCC is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # GCC is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with GCC; see the file COPYING. If not, write to # the Free Software Foundation, 51 Franklin Street, Fifth Floor, # Boston, MA 02110-1301, USA. # This script parses a .diff file generated with 'diff -up' or 'diff -cp' # and adds a skeleton ChangeLog file to the file. It does not try to be # too smart when parsing function names, but it produces a reasonable # approximation. # # Author: Martin Liska import argparse import datetime import json import os import re import subprocess import sys from itertools import takewhile import requests from unidiff import PatchSet LINE_LIMIT = 100 TAB_WIDTH = 8 CO_AUTHORED_BY_PREFIX = 'co-authored-by: ' pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)') prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)') dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)') dg_regex = re.compile(r'{\s+dg-(error|warning)') pr_filename_regex = re.compile(r'(^|[\W_])[Pp][Rr](?P\d{4,})') identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)') comment_regex = re.compile(r'^\/\*') struct_regex = re.compile(r'^(class|struct|union|enum)\s+' r'(GTY\(.*\)\s+)?([a-zA-Z0-9_]+)') macro_regex = re.compile(r'#\s*(define|undef)\s+([a-zA-Z0-9_]+)') super_macro_regex = re.compile(r'^DEF[A-Z0-9_]+\s*\(([a-zA-Z0-9_]+)') fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]') template_and_param_regex = re.compile(r'<[^<>]*>') md_def_regex = re.compile(r'\(define.*\s+"(.*)"') bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \ 'include_fields=summary,component' function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'} # NB: Makefile.in isn't listed as it's not always generated. generated_files = {'aclocal.m4', 'config.h.in', 'configure'} help_message = """\ Generate ChangeLog template for PATCH. PATCH must be generated using diff(1)'s -up or -cp options (or their equivalent in git). """ script_folder = os.path.realpath(__file__) root = os.path.dirname(os.path.dirname(script_folder)) def find_changelog(path): folder = os.path.split(path)[0] while True: if os.path.exists(os.path.join(root, folder, 'ChangeLog')): return folder folder = os.path.dirname(folder) if folder == '': return folder raise AssertionError() def extract_function_name(line): if comment_regex.match(line): return None m = struct_regex.search(line) if m: # Struct declaration return m.group(1) + ' ' + m.group(3) m = macro_regex.search(line) if m: # Macro definition return m.group(2) m = super_macro_regex.search(line) if m: # Supermacro return m.group(1) m = fn_regex.search(line) if m: # Discard template and function parameters. fn = m.group(1) fn = re.sub(template_and_param_regex, '', fn) return fn.rstrip() return None def try_add_function(functions, line): fn = extract_function_name(line) if fn and fn not in functions: functions.append(fn) return bool(fn) def sort_changelog_files(changed_file): return (changed_file.is_added_file, changed_file.is_removed_file) def get_pr_titles(prs): output = [] for idx, pr in enumerate(prs): pr_id = pr.split('/')[-1] r = requests.get(bugzilla_url % pr_id) bugs = r.json()['bugs'] if len(bugs) == 1: prs[idx] = 'PR %s/%s' % (bugs[0]['component'], pr_id) out = '%s - %s\n' % (prs[idx], bugs[0]['summary']) if out not in output: output.append(out) if output: output.append('') return '\n'.join(output) def append_changelog_line(out, relative_path, text): line = f'\t* {relative_path}:' if len(line.replace('\t', ' ' * TAB_WIDTH) + ' ' + text) <= LINE_LIMIT: out += f'{line} {text}\n'
Re: [Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]
Hi Harald, On 13.12.22 23:27, Harald Anlauf wrote: Am 13.12.22 um 22:41 schrieb Tobias Burnus: Back to differences: 'diff -U0 -p -w' against the last GCC 11 branch shows: ... @@ -35,0 +37,2 @@ export_proto(cfi_desc_to_gfc_desc); +/* NOTE: Since GCC 12, the FE generates code to do the conversion + directly without calling this function. */ @@ -63 +66 @@ cfi_desc_to_gfc_desc (gfc_array_void *d, - d->dtype.version = s->version; + d->dtype.version = 0; I was wondering what the significance of "version" is. In ISO_Fortran_binding.h we seem to always have #define CFI_VERSION 1 and it did not change with gcc-12. The version is 1 for CFI but it is 0 for GFC. However, as we do not check the GFC version anywhere and it is not publicly exposed, it does not really matter. Still, "d->dtype.version = 0;" matches what the compiler itself produces – and for consistency, setting it to 0 is better than setting it to 1 (via CFI's version field). Actually 'dtype.version' is not really set anywhere; at least gfc_get_dtype_rank_type(...) does not set it; zero initialization is most common but it could be also some random value. In libgfortran, GFC_DTYPE_CLEAR explicitly sets it to 0. @@ -100,2 +110,2 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt -d = malloc (sizeof (CFI_cdesc_t) - + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t))); +d = calloc (1, (sizeof (CFI_cdesc_t) + + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t; @@ -107 +117 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt - d->version = s->dtype.version; + d->version = CFI_VERSION; This treatment of "version" was the equivalent to the above that confused me. Assuming we were to change CFI_VERSION in gcc-13+, is this the right choice here regarding backward compatibility? I don't think we will change CFI version any time soon as we rather closely follow the Fortran standard and I do not see any changes which are required there. NOTE: As s->dtype.version is either 0 or some random value, setting version in the CFI / ISO C descriptor to 1, be it as literal or as macro constant, makes it the same as CFI_VERSION. And: I don't think we will change CFI_VERSION or the structure of the CFI array descriptor any time soon; there does not seem to be any need for it, it matches the Fortran standard one well (and no plans seem to be planed on that side) and, finally, changing an array descriptor is painful! However, using '1; /* CFI_VERSION in GCC 11 and at time of writing. */' would also work – but I would expect that we will go through all CFI users if we ever change the descriptor (and bump the version), possibly adding version-number dependent code. So besides the "version" question ok from my side. I hope I could answer the latter. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]
Hi Harald, On 13.12.22 21:53, Harald Anlauf via Gcc-patches wrote: I now did so - except for three fixes (cf. changelog). See also PR: https://gcc.gnu.org/PR108056 There is no testcase as it needs to be compiled by GCC <= 11 and then run with linking (dynamically) to a GCC 12 or 13 libgfortran. I've looked at the resulting ISO_Fortran_binding.c vs. the 11-branch version and am still trying to understand the resulting differences in the code, in what respect they might be relevant or not. Hmm, if I run a diff, I do not see much differences. Note: We only talk about those two functions, the other functions are used by both GCC <= 11 and GCC >= 12. Fortunately, these functions matter most as they map GFC internals to CFI internals or vice versa. Most other functions are user callable and there incompatibilities are less likely to show up and GCC 11 users also could profit from fixes there. It looks as if CFI_section and CFI_select_part had some larger changes, likewise CFI_setpointer. Back to differences: 'diff -U0 -p -w' against the last GCC 11 branch shows: ... @@ -35,0 +37,2 @@ export_proto(cfi_desc_to_gfc_desc); +/* NOTE: Since GCC 12, the FE generates code to do the conversion + directly without calling this function. */ @@ -63 +66 @@ cfi_desc_to_gfc_desc (gfc_array_void *d, - d->dtype.version = s->version; + d->dtype.version = 0; @@ -76,0 +80 @@ cfi_desc_to_gfc_desc (gfc_array_void *d, + if (GFC_DESCRIPTOR_DATA (d)) @@ -79,3 +83,7 @@ cfi_desc_to_gfc_desc (gfc_array_void *d, - GFC_DESCRIPTOR_LBOUND(d, n) = (index_type)s->dim[n].lower_bound; - GFC_DESCRIPTOR_UBOUND(d, n) = (index_type)(s->dim[n].extent - + s->dim[n].lower_bound - 1); + CFI_index_t lb = 1; + + if (s->attribute != CFI_attribute_other) + lb = s->dim[n].lower_bound; + + GFC_DESCRIPTOR_LBOUND(d, n) = (index_type)lb; + GFC_DESCRIPTOR_UBOUND(d, n) = (index_type)(s->dim[n].extent + lb - 1); @@ -89,0 +98,2 @@ export_proto(gfc_desc_to_cfi_desc); +/* NOTE: Since GCC 12, the FE generates code to do the conversion + directly without calling this function. */ @@ -100,2 +110,2 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt -d = malloc (sizeof (CFI_cdesc_t) - + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t))); +d = calloc (1, (sizeof (CFI_cdesc_t) + + (CFI_type_t)(CFI_MAX_RANK * sizeof (CFI_dim_t; @@ -107 +117 @@ gfc_desc_to_cfi_desc (CFI_cdesc_t **d_pt - d->version = s->dtype.version; + d->version = CFI_VERSION; @@ -153 +163 @@ void *CFI_address (const CFI_cdesc_t *dv ... Given that this is a somewhat delicate situation we're in, is there a set of tests that I could run *manually* (i.e. compile with gcc-11 and link with gcc-12/13) to verify that this best-effort fix should be good enough for the common user? Just a suggestion of a few "randomly" chosen tests? Probably yes. I don't have a better suggestion. The problem is that it usually only matters in some corner cases, like in the PR where a not some argument is passed to the GFC→CFI conversion but first a Fortran function is called with TYPE(*) any only then it is passed on. – Such cases are usually not in the testsuite. (With GCC 12 we have a rather complex testsuite, but obviously it also does not cover everything.) Note: It is strongly recommended to use GCC 12 (or 13) with array-descriptor C interop as many issues were fixed. [...] Well, in the real world there are larger installations with large software stacks, and it is easier said to "compile each component with the same compiler version" than done... I concur – but there were really many fixes for the array descriptor / TS29113 in GCC 12. It is simply not possible to fix tons of bugs and be 100% compatible with the working bits of the old version – especially if they only work if one does not look sharply at the result. (Like here, were 'type' is wrong, which does not matter unless in programs which use them.) Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] OpenMP: Parse align clause in allocate directive in C/C++
We have a working parsing support for the 'allocate' directive (failing immediately with a sorry after parsing). To be in line with the rest of the allocat(e,or) etc. handling, it makes sense to take care of 'align' as well, which is this patch does - it still fails with a 'sorry' after parsing. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Parse align clause in allocate directive in C/C++ gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_allocate): Parse align clause and check for restrictions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_allocate): Parse align clause. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-5.c: Extend for align clause. gcc/c/c-parser.cc| 88 gcc/cp/parser.cc | 58 +- gcc/testsuite/c-c++-common/gomp/allocate-5.c | 36 3 files changed, 144 insertions(+), 38 deletions(-) diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 1bbb39f9b08..62c302748dd 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -18819,32 +18819,71 @@ c_parser_oacc_wait (location_t loc, c_parser *parser, char *p_name) return stmt; } -/* OpenMP 5.0: - # pragma omp allocate (list) [allocator(allocator)] */ +/* OpenMP 5.x: + # pragma omp allocate (list) clauses + + OpenMP 5.0 clause: + allocator (omp_allocator_handle_t expression) + + OpenMP 5.1 additional clause: + align (int expression)] */ static void c_parser_omp_allocate (location_t loc, c_parser *parser) { + tree alignment = NULL_TREE; tree allocator = NULL_TREE; tree nl = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_ALLOCATE, NULL_TREE); - if (c_parser_next_token_is (parser, CPP_COMMA) - && c_parser_peek_2nd_token (parser)->type == CPP_NAME) -c_parser_consume_token (parser); - if (c_parser_next_token_is (parser, CPP_NAME)) + do { + if (c_parser_next_token_is (parser, CPP_COMMA) + && c_parser_peek_2nd_token (parser)->type == CPP_NAME) + c_parser_consume_token (parser); + if (!c_parser_next_token_is (parser, CPP_NAME)) + break; matching_parens parens; const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value); c_parser_consume_token (parser); - if (strcmp ("allocator", p) != 0) - error_at (c_parser_peek_token (parser)->location, - "expected %"); - else if (parens.require_open (parser)) + location_t expr_loc = c_parser_peek_token (parser)->location; + if (strcmp ("align", p) != 0 && strcmp ("allocator", p) != 0) { - location_t expr_loc = c_parser_peek_token (parser)->location; - c_expr expr = c_parser_expr_no_commas (parser, NULL); - expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true); - allocator = expr.value; - allocator = c_fully_fold (allocator, false, NULL); + error_at (c_parser_peek_token (parser)->location, + "expected % or %"); + break; + } + if (!parens.require_open (parser)) + break; + + c_expr expr = c_parser_expr_no_commas (parser, NULL); + expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true); + expr_loc = c_parser_peek_token (parser)->location; + if (p[2] == 'i' && alignment) + { + error_at (expr_loc, "too many %qs clauses", "align"); + break; + } + else if (p[2] == 'i') + { + alignment = c_fully_fold (expr.value, false, NULL); + if (TREE_CODE (alignment) != INTEGER_CST + || !INTEGRAL_TYPE_P (TREE_TYPE (alignment)) + || tree_int_cst_sgn (alignment) != 1 + || !integer_pow2p (alignment)) + { + error_at (expr_loc, "% clause argument needs to be " + "positive constant power of two integer " + "expression"); + alignment = NULL_TREE; + } + } + else if (allocator) + { + error_at (expr_loc, "too many %qs clauses", "allocator"); + break; + } + else + { + allocator = c_fully_fold (expr.value, false, NULL); tree orig_type = expr.original_type ? expr.original_type : TREE_TYPE (allocator); orig_type = TYPE_MAIN_VARIANT (orig_type); @@ -18853,20 +18892,23 @@ c_parser_omp_allocate (location_t loc, c_parser *parser) || TYPE_NAME (orig_type) != get_identifier ("omp_allocator_handle_t")) { - error_at (expr_loc, "% clause allocator expression " -"has type %qT rather than " -"%", -TREE_TYPE (allocator)); + error_at (expr_loc, + "% clause allocator expression has type " + "%qT rather than %", + TREE_TYPE (allocator)); allocator = NULL_TREE; } - parens.skip_until_found_close (parser); } -} + parens.skip_until_found_close (parser); +} while (true); c_parser_skip_to_pragma_eol (parser); - if (allocator) + if (allocator ||
[Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause
I missed that 'align' needs to be a power of 2 - contrary to 'aligned', which does not have this restriction for some odd reason. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Extend align-clause checks of OpenMP's allocate directive gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_clauses): Check also for power of two. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-3.f90: Fix ALIGN usage, remove unused -fdump-tree-original. * testsuite/libgomp.fortran/allocate-4.f90: New. diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 686f924b47a..5468cc97969 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -7315,11 +7315,12 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, || n->u.align->ts.type != BT_INTEGER || n->u.align->rank != 0 || gfc_extract_int (n->u.align, ) - || alignment <= 0) + || alignment <= 0 + || !pow2p_hwi (alignment)) { - gfc_error ("ALIGN modifier requires a scalar positive " - "constant integer alignment expression at %L", - >u.align->where); + gfc_error ("ALIGN modifier requires at %L a scalar positive " + "constant integer alignment expression that is a " + "power of two", >u.align->where); break; } } diff --git a/libgomp/testsuite/libgomp.fortran/allocate-3.f90 b/libgomp/testsuite/libgomp.fortran/allocate-3.f90 index a39819164d6..1fa0bb932c3 100644 --- a/libgomp/testsuite/libgomp.fortran/allocate-3.f90 +++ b/libgomp/testsuite/libgomp.fortran/allocate-3.f90 @@ -1,5 +1,4 @@ ! { dg-do compile } -! { dg-additional-options "-fdump-tree-original" } use omp_lib implicit none @@ -23,6 +22,7 @@ integer :: q, x,y,z ! { dg-error "Object 'omp_high_bw_mem_alloc' is not a variable" "" { target *-*-* } .-1 } !$omp end parallel -!$omp parallel allocate( align(q) : x) firstprivate(x) ! { dg-error "31:ALIGN modifier requires a scalar positive constant integer alignment expression at" } +!$omp parallel allocate( align(128) : x) firstprivate(x) ! OK !$omp end parallel + end diff --git a/libgomp/testsuite/libgomp.fortran/allocate-4.f90 b/libgomp/testsuite/libgomp.fortran/allocate-4.f90 new file mode 100644 index 000..ddb507ba8e4 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/allocate-4.f90 @@ -0,0 +1,42 @@ +! { dg-do compile } + + +subroutine test() +use iso_c_binding, only: c_intptr_t +implicit none +integer, parameter :: omp_allocator_handle_kind = 1 !! <<< +integer (kind=omp_allocator_handle_kind), & + parameter :: omp_high_bw_mem_alloc = 4 +integer :: q, x,y,z +integer, parameter :: cnst(2) = [64, 101] + +!$omp parallel allocate( omp_high_bw_mem_alloc : x) firstprivate(x) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind" } +!$omp end parallel + +!$omp parallel allocate( allocator (omp_high_bw_mem_alloc) : x) firstprivate(x) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind" } +!$omp end parallel + +!$omp parallel allocate( align (q) : x) firstprivate(x) ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel + +!$omp parallel allocate( align (32) : x) firstprivate(x) ! OK +!$omp end parallel + +!$omp parallel allocate( align(q) : x) firstprivate(x) ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel + +!$omp parallel allocate( align(cnst(1)) : x ) firstprivate(x) ! OK +!$omp end parallel + +!$omp parallel allocate( align(cnst(2)) : x) firstprivate(x) ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel + +!$omp parallel allocate( align( 31) :x) firstprivate(x) ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel + +!$omp parallel allocate( align (32.0): x) firstprivate(x) ! { dg-error "32:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel + +!$omp parallel allocate( align(cnst ) : x ) firstprivate(x) ! { dg-error "31:ALIGN modifier requires at \\(1\\) a scalar positive constant integer alignment expression that is a power of two" } +!$omp end parallel +end
[Patch, Fortran] libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]
This is a 12/13 regression as come changes to fix the GFC/CFI descriptor that went into GCC 12 fail with the (bogus) descriptor passed via by a GCC-11-compiled program. As later GCC 12 changes moved the descriptor to the front end, those functions are only in libgomp.so to cater for old program. Richard suggested in the PR that the best way is to move to the GCC 11 version, such that libgfortran.so won't regress. I now did so - except for three fixes (cf. changelog). See also PR: https://gcc.gnu.org/PR108056 There is no testcase as it needs to be compiled by GCC <= 11 and then run with linking (dynamically) to a GCC 12 or 13 libgfortran. OK for mainline and GCC 12? * * * Note: It is strongly recommended to use GCC 12 (or 13) with array-descriptor C interop as many issues were fixed. Like for the testcase in the PR; in GCC 11 the type arriving in libgomp is BT_ASSUME ('type(*)'). But as the effective argument is passed as array descriptor through out, the 'float' (real(4)) type info is actually preservable (as GCC 12 cf. testcase of comment 0 and my comment in the PR for the C part of the testcase).(*) Tobias ((*) This is not possible if using a scalar 'type(*)' or a non-array-descriptor array in between. I think GCC 12 uses 'CFI_other' in the information-is-lost case.) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056] Since GCC 12, the conversion between the array descriptors formats - the internal (GFC) and the C binding one (CFI) moved to the compiler itself such that the cfi_desc_to_gfc_desc/gfc_desc_to_cfi_desc functions are only used with older code (GCC 9 to 11). The newly added checks caused asserts as older code did not pass the proper values (e.g. real(4) as effective argument arrived as BT_ASSUME type as the effective type got lost inbetween). As proposed in the PR, revert to the GCC 11 version - known bugs is better than some fixes and new issues. Still, GCC 12 is much better in terms of TS29113 support and should really be used. This patch uses the current libgomp version of the GCC 11 branch, except it fixes the GFC version number (which is 0), uses calloc instead of malloc, and sets the lower bound to 1 instead of keeping it as is for CFI_attribute_other. libgfortran/ChangeLog: PR libfortran/108056 * runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc, gfc_desc_to_cfi_desc): Mostly revert to GCC 11 version for those backward-compatiblity-only functions. diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c index 342df4275b9..e63a717a69b 100644 --- a/libgfortran/runtime/ISO_Fortran_binding.c +++ b/libgfortran/runtime/ISO_Fortran_binding.c @@ -39,60 +39,31 @@ export_proto(cfi_desc_to_gfc_desc); void cfi_desc_to_gfc_desc (gfc_array_void *d, CFI_cdesc_t **s_ptr) { - signed char type; - size_t size; int n; + index_type kind; CFI_cdesc_t *s = *s_ptr; if (!s) return; - /* Verify descriptor. */ - switch (s->attribute) -{ -case CFI_attribute_pointer: -case CFI_attribute_allocatable: - break; -case CFI_attribute_other: - if (s->base_addr) - break; - runtime_error ("Nonallocatable, nonpointer actual argument to BIND(C) " - "dummy argument where the effective argument is either " - "not allocated or not associated"); - break; -default: - runtime_error ("Invalid attribute type %d in CFI_cdesc_t descriptor", - (int) s->attribute); - break; -} GFC_DESCRIPTOR_DATA (d) = s->base_addr; + GFC_DESCRIPTOR_TYPE (d) = (signed char)(s->type & CFI_type_mask); + kind = (index_type)((s->type - (s->type & CFI_type_mask)) >> CFI_type_kind_shift); /* Correct the unfortunate difference in order with types. */ - type = (signed char)(s->type & CFI_type_mask); - switch (type) -{ -case CFI_type_Character: - type = BT_CHARACTER; - break; -case CFI_type_struct: - type = BT_DERIVED; - break; -case CFI_type_cptr: - /* FIXME: PR 100915. GFC descriptors do not distinguish between - CFI_type_cptr and CFI_type_cfunptr. */ - type = BT_VOID; - break; -default: - break; -} - - GFC_DESCRIPTOR_TYPE (d) = type; - GFC_DESCRIPTOR_SIZE (d) = s->elem_len; + if (GFC_DESCRIPTOR_TYPE (d) == BT_CHARACTER) +GFC_DESCRIPTOR_TYPE (d) = BT_DERIVED; + else if (GFC_DESCRIPTOR_TYPE (d) == BT_DERIVED) +GFC_DESCRIPTOR_TYPE (d) = BT_CHARACTER; + + if (!s->rank || s->dim[0].sm == (CFI_index_t)s->elem_len) +GFC_DESCRIPTOR_SIZE (d) = s->elem_len; + else if (GFC_DESCRIPTOR_TYPE (d) != BT_DERIVED) +GFC_DESCRIPTOR_SIZE (d) = kind; + else +GFC_DESCRIPTOR_SIZE (d) = s->elem_len; d->dtype.version =
[committed] fortran/openmp.cc: Remove 's' that slipped in during %<..%> replacement (was: [Patch] Fortran: Replace simple '.' quotes by %<.%>)
On 09.12.22 22:12, Tobias Burnus wrote: Found when working on the just submitted/committed patch. Committed as r13-4590 – but it required a follow-up that I somehow missed :-/ but that is now committed as well (as r13-4597). Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 045592f665bcb67b75dc6b86badbe2fd44aed3e6 Author: Tobias Burnus Date: Sun Dec 11 11:47:55 2022 +0100 fortran/openmp.cc: Remove 's' that slipped in during %<..%> replacement Seemingly, 's' (in VI that's the 's'ubstitute command) appeared verbatim in a gfc_error message when to doing the '...' to %<...%> replacements in commit r13-4590-g84f6f8a2a97f88be01e223c9c9dbab801a4f501f gcc/fortran/ * openmp.cc (gfc_match_omp_context_selector_specification): Remove spurious 's' in an error message. --- gcc/fortran/openmp.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 7edc78ad0cb..686f924b47a 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -5568,7 +5568,7 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv) if (m != MATCH_YES || i == selector_set_count) { - gfc_error ("expected %, %, % " + gfc_error ("expected %, %, % " "or % at %C"); return MATCH_ERROR; }
Re: [PATCH 2/2] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)
Hi Julian, On 10.12.22 13:10, Julian Brown wrote: On Thu, 8 Dec 2022 13:04:20 +0100 Tobias Burnus wrote: All in all, I am fine with the patch - but I spotted some issues. ... I believe this patch covers all the above cases (hopefully appropriately generalised), at least for Fortran. I haven't attempted to fix any missing cases for C, for now. Re-tested with offloading to NVPTX (with a few supporting patches, as before). Does this look OK now? Yes, LGTM. Thanks! Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads)
Now that the reverse-offload patch is (nearly) in: On 07.12.22 09:08, Tobias Burnus wrote: On 06.12.22 08:45, Tobias Burnus wrote: * As follow-up, libgomp.texi must be updated Slight update to that uncommitted patch: I extended the nvptx entry to state that only one reverse-offload region runs at a given time. OK? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp.texi: Reverse-offload updates libgomp/ * libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'. (GCN): Add item about 'omp requires'. (nvptx): Likewise; add item about reverse offload. libgomp/libgomp.texi | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index b6c1ed714ce..f95e82fc8aa 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported. env variable @tab Y @tab @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab @item @code{requires} directive @tab P - @tab complete but no non-host devices provides @code{unified_address}, - @code{unified_shared_memory} or @code{reverse_offload} + @tab complete but no non-host devices provides @code{unified_address} or + @code{unified_shared_memory} @item @code{teams} construct outside an enclosing target region @tab Y @tab @item Non-rectangular loop nests @tab Y @tab @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab @@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported. @item @code{allocate} clause @tab P @tab Initial support @item @code{use_device_addr} clause on @code{target data} @tab Y @tab @item @code{ancestor} modifier on @code{device} clause - @tab Y @tab See comment for @code{requires} + @tab Y @tab Host fallback with GCN devices @item Implicit declare target directive @tab Y @tab @item Discontiguous array section with @code{target update} construct @tab N @tab @@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported. @code{append_args} @tab N @tab @item @code{dispatch} construct @tab N @tab @item device-specific ICV settings with environment variables @tab Y @tab -@item @code{assume} directive @tab Y @tab +@item @code{assume} and @code{assumes} directives @tab Y @tab @item @code{nothing} directive @tab Y @tab @item @code{error} directive @tab Y @tab @item @code{masked} construct @tab Y @tab @@ -4456,6 +4456,9 @@ The implementation remark: @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported using the C library @code{printf} functions and the Fortran @code{print}/@code{write} statements. +@item OpenMP code that has a requires directive with @code{unified_address}, + @code{unified_shared_memory} or @code{reverse_offload} will remove + any GCN device from the list of available devices (``host fallback''). @end itemize @@ -4507,6 +4510,15 @@ The implementation remark: @item Compilation OpenMP code that contains @code{requires reverse_offload} requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30} is not supported. +@item For code containing reverse offload (i.e. @code{target} regions with + @code{device(ancestor:1)}), there is a slight performance penality + for @emph{all} target regions, consisting mostly of shutdown delay + Per device, reverse offload regions are processed serial such that + the next reverse offload region is only executed after the previous + one returns. +@item OpenMP code that has a requires directive with @code{unified_address} + or @code{unified_shared_memory} will remove any nvptx device from the + list of available devices (``host fallback''). @end itemize
Re: [Patch] libgomp: Handle OpenMP's reverse offloads
On 09.12.22 15:44, Jakub Jelinek wrote: On Tue, Dec 06, 2022 at 08:45:07AM +0100, Tobias Burnus wrote: [...] I think we just shouldn't support libgomp plugins for 32-bit libgomp, only host fallback. If you want offloading, use 64-bit host... (I concur.) libgomp: Handle OpenMP's reverse offloads + /* Likeverse for the reverse lookup device->host for reverse offload. */ Likewise + reverse_splay_tree_node rev_array; Do we need reverse_splay_tree* stuff in libgomp.h? As splay_tree_node is just a pointer, perhaps just struct reverse_splay_tree_node_s; early and struct reverse_splay_tree_node_s *rev_array; in libgomp.h and include the extra splay-tree.h only in target.c? Unless one needs it anywhere else... It is used as 'typedef struct reverse_splay_tree_node_s *reverse_splay_tree_node;' in struct target_mem_desc { reverse_splay_tree_node rev_array; } but also as struct gomp_device_descr { ... struct reverse_splay_tree_s mem_map_rev; } The latter is struct reverse_splay_tree_key_s { /* Address of the device object. */ uint64_t dev; splay_tree_key k; }; which in turn needs 'splay_tree_key'. Thus, I could either commit it as is – or turn the latter also into a pointer and malloc it. Currently, it is accessed as mem_map.k.root = NULL for init and later through the splay-tree functions indirectly. Thoughts? Unless there are further comments, I will later commit it as is. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] Fortran: Replace simple '.' quotes by %<.%>
Found when working on the just submitted/committed patch. I intent to commit it to mainline as obvious tomorrow (or Sun or Mon), unless there are comments. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Replace simple '.' quotes by %<.%> When using %qs instead of '%s' or %<=%> instead of '=' looks nicer by having nicer quotes and bold text, if the terminal supports it; otherwise, plain quotes are used. gcc/fortran/ChangeLog: * match.cc (gfc_match_member_sep): Use %<...%> in gfc_error. * openmp.cc (gfc_match_oacc_routine, gfc_match_omp_context_selector, gfc_match_omp_context_selector_specification, gfc_match_omp_declare_variant, resolve_omp_clauses): Likewise; use %qs instead of '%s'. * primary.cc (match_real_constant, gfc_match_varspec): Likewise. * resolve.cc (gfc_resolve_formal_arglist, resolve_operator, resolve_ordinary_assign): Likewise. diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc index 7ba0f349993..89fb115c0f6 100644 --- a/gcc/fortran/match.cc +++ b/gcc/fortran/match.cc @@ -195,3 +195,3 @@ gfc_match_member_sep(gfc_symbol *sym) gfc_error ("Expected structure component or operator name " - "after '.' at %C"); + "after %<.%> at %C"); goto error; diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 4b4e6ac6947..7edc78ad0cb 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -4061,3 +4061,3 @@ gfc_match_oacc_routine (void) gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, expecting" - " ')' after NAME"); + " %<)%> after NAME"); gfc_current_locus = old_loc; @@ -5350,4 +5350,4 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("selector '%s' not allowed for context selector " - "set '%s' at %C", + gfc_error ("selector %qs not allowed for context selector " + "set %qs at %C", selector, oss->trait_set_selector_name); @@ -5370,3 +5370,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("selector '%s' does not accept any properties at %C", + gfc_error ("selector %qs does not accept any properties at %C", selector); @@ -5379,3 +5379,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("expected '(' at %C"); + gfc_error ("expected %<(%> at %C"); return MATCH_ERROR; @@ -5401,3 +5401,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("expected ')' at %C"); + gfc_error ("expected %<)%> at %C"); return MATCH_ERROR; @@ -5514,3 +5514,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("expected ')' at %C"); + gfc_error ("expected %<)%> at %C"); return MATCH_ERROR; @@ -5524,3 +5524,3 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss) { - gfc_error ("expected '(' at %C"); + gfc_error ("expected %<(%> at %C"); return MATCH_ERROR; @@ -5570,4 +5570,4 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv) { - gfc_error ("expected 'construct', 'device', 'implementation' or " - "'user' at %C"); + gfc_error ("expected %, %, % " + "or % at %C"); return MATCH_ERROR; @@ -5578,3 +5578,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv) { - gfc_error ("expected '=' at %C"); + gfc_error ("expected %<=%> at %C"); return MATCH_ERROR; @@ -5585,3 +5585,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv) { - gfc_error ("expected '{' at %C"); + gfc_error ("expected %<{%> at %C"); return MATCH_ERROR; @@ -5600,3 +5600,3 @@ gfc_match_omp_context_selector_specification (gfc_omp_declare_variant *odv) { - gfc_error ("expected '}' at %C"); + gfc_error ("expected %<}%> at %C"); return MATCH_ERROR; @@ -5622,3 +5622,3 @@ gfc_match_omp_declare_variant (void) { - gfc_error ("expected '(' at %C"); + gfc_error ("expected %<(%> at %C"); return MATCH_ERROR; @@ -5670,3 +5670,3 @@ gfc_match_omp_declare_variant (void) { - gfc_error ("expected ')' at %C"); + gfc_error ("expected %<)%> at %C"); return MATCH_ERROR; @@ -5680,3 +5680,3 @@ gfc_match_omp_declare_variant (void) { - gfc_error ("expected 'match' at %C"); + gfc_error ("expected % at %C"); return MATCH_ERROR; @@ -5689,3 +5689,3 @@ gfc_match_omp_declare_variant (void) { - gfc_error ("expected '(' at %C"); + gfc_error ("expected %<(%> at %C"); return MATCH_ERROR; @@ -5698,3 +5698,3 @@ gfc_match_omp_declare_variant (void) { - gfc_error ("expected ')' at %C"); + gfc_error ("expected %<)%> at %C"); return MATCH_ERROR; @@ -7380,3 +7380,3 @@ resolve_omp_clauses (gfc_code
[Patch] Fortran/OpenMP: align/allocator modifiers to the allocate clause
Implementing the 5.1 syntax inside the 'allocate' clause. That's a fallout of working on something else... OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/OpenMP: align/allocator modifiers to the allocate clause gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE output. * gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'. (gfc_free_omp_namelist): Add bool arg. * match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'. * openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction, gfc_match_omp_flush): Update call. (gfc_match_omp_clauses): Match 'align/allocate modifers in 'allocate' clause. (resolve_omp_clauses): Resolve align. * st.cc (gfc_free_statement): Update call * trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'. libgomp/ChangeLog: * libgomp.texi (5.1 Impl. Status): Split allocate clause/directive item about 'align'; mark clause as 'Y' and directive as 'N'. * testsuite/libgomp.fortran/allocate-2.f90: New test. * testsuite/libgomp.fortran/allocate-3.f90: New test. gcc/fortran/dump-parse-tree.cc | 23 + gcc/fortran/gfortran.h | 3 +- gcc/fortran/match.cc | 4 +- gcc/fortran/openmp.cc| 106 +++ gcc/fortran/st.cc| 2 +- gcc/fortran/trans-openmp.cc | 8 ++ libgomp/libgomp.texi | 4 +- libgomp/testsuite/libgomp.fortran/allocate-2.f90 | 25 ++ libgomp/testsuite/libgomp.fortran/allocate-3.f90 | 28 ++ 9 files changed, 163 insertions(+), 40 deletions(-) diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 2f042ab5142..5ae72dc1cac 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -1357,6 +1357,29 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) } ns_iter = n->u2.ns; } + if (list_type == OMP_LIST_ALLOCATE) + { + if (n->expr) + { + fputs ("allocator(", dumpfile); + show_expr (n->expr); + fputc (')', dumpfile); + } + if (n->expr && n->u.align) + fputc (',', dumpfile); + if (n->u.align) + { + fputs ("allocator(", dumpfile); + show_expr (n->u.align); + fputc (')', dumpfile); + } + if (n->expr || n->u.align) + fputc (':', dumpfile); + fputs (n->sym->name, dumpfile); + if (n->next) + fputs (") ALLOCATE(", dumpfile); + continue; + } if (list_type == OMP_LIST_REDUCTION) switch (n->u.reduction_op) { diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index b541a07e2c7..5f8a81ae4a1 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -1349,6 +1349,7 @@ typedef struct gfc_omp_namelist gfc_omp_reduction_op reduction_op; gfc_omp_depend_doacross_op depend_doacross_op; gfc_omp_map_op map_op; + gfc_expr *align; struct { ENUM_BITFIELD (gfc_omp_linear_op) op:4; @@ -3572,7 +3573,7 @@ void gfc_free_iterator (gfc_iterator *, int); void gfc_free_forall_iterator (gfc_forall_iterator *); void gfc_free_alloc_list (gfc_alloc *); void gfc_free_namelist (gfc_namelist *); -void gfc_free_omp_namelist (gfc_omp_namelist *, bool); +void gfc_free_omp_namelist (gfc_omp_namelist *, bool, bool); void gfc_free_equiv (gfc_equiv *); void gfc_free_equiv_until (gfc_equiv *, gfc_equiv *); void gfc_free_data (gfc_data *); diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc index 8b8b6e79c8b..7ba0f349993 100644 --- a/gcc/fortran/match.cc +++ b/gcc/fortran/match.cc @@ -5524,13 +5524,15 @@ gfc_free_namelist (gfc_namelist *name) /* Free an OpenMP namelist structure. */ void -gfc_free_omp_namelist (gfc_omp_namelist *name, bool free_ns) +gfc_free_omp_namelist (gfc_omp_namelist *name, bool free_ns, bool free_align) { gfc_omp_namelist *n; for (; name; name = n) { gfc_free_expr (name->expr); + if (free_align) + gfc_free_expr (name->u.align); if (free_ns) gfc_free_namespace (name->u2.ns); else if (name->u2.udr) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 862c649b0b6..4b4e6ac6947 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -187,7 +187,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c) gfc_free_expr (c->vector_length_expr); for (i = 0; i < OMP_LIST_NUM; i++) gfc_free_omp_namelist (c->lists[i], - i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND); + i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND, + i == OMP_LIST_ALLOCATE); gfc_free_expr_list (c->wait_list); gfc_free_expr_list (c->tile_list); free (CONST_CAST (char *, c->critical_name)); @@ -542,7
Re: [PATCH 02/17] libgomp: pinned memory
On 08.12.22 15:35, Andrew Stubbs wrote: On 08/12/2022 14:02, Tobias Burnus wrote: With available, I assume that nvptx is an 'available device' (per OpenMP definition, finally added in TR11), i.e. there is an image for nvptx and - after omp_requires filtering - there remains at least one nvptx device. If plugin-nvptx has been loaded then the function will be available. Do we need to get fancier than that? I think it does not really make sense to use CUDA if there is no single device. In terms of loading, the code does: gomp_target_init(void) { ... cur = OFFLOAD_PLUGINS; /* This is a comma-separated string with the supported plugins. */ ... if (gomp_load_plugin_for_device (_device, plugin_name)) { int omp_req = omp_requires_mask & ~GOMP_REQUIRES_TARGET_USED; new_num_devs = current_device.get_num_devices_func (omp_req); Thus, CUDA is loaded at the 'gomp_load_plugin_for_device' line and at the 'new_num_devs =' line, it has been filtered for OpenMP's 'requires' demands.* Thus, 'new_num_devs' contains the number of 'accessible devices' (OpenMP definition), filtered for the 'requires'* (which part of the 'supported devices' requirements). (* With some caveats related to late loading of offloading code from (shared) libraries.) * * * Admittedly, this does not yet cover the last suggested feature: GOMP_offload_register_ver (...) { gomp_load_image_to_device (devicep, version, which is relevant for the first part of: 'supported devices' - '... supported by the implementation for execution of target code ... requires directive are fulfilled'. (available = (intersection of 'accessible devices' and 'supported devices') possibly filtered + reordered via the OMP_AVAILABLE_DEVICES env var.) I am not sure how strictly it is required and when we know when the all offload_register are over; I do note that OpenMP TR 11 has an over-engineered OMP_AVAILABLE_DEVICES environment variable which permits to filter the list of available devices – which also requires early access to the initial 'available devices' list. But it might be sufficient to rely on the device-is-accessible + requires filtering and ignore whether an actual image is available. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH 02/17] libgomp: pinned memory
On 08.12.22 13:51, Andrew Stubbs wrote: On 08/12/2022 12:11, Jakub Jelinek wrote: On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. As I said before, I think the pinned memory is too precious to waste it this way, we should handle the -> pinned case through memkind_create_fixed on mmap + mlock area, that way we can create even quite small pinned allocations. This has been delayed due to other priorities, but our current plan is to switch to using cudaHostAlloc, when available, but we can certainly use memkind_create_fixed for the fallback case (including amdgcn). With available, I assume that nvptx is an 'available device' (per OpenMP definition, finally added in TR11), i.e. there is an image for nvptx and - after omp_requires filtering - there remains at least one nvptx device. * * * For completeness, I want to note that OpenMP TR11 adds support for creating memory spaces that are accessible from multiple devices, e.g. host + one/all devices, and adds some convenience functions for the latter (all devices, host and a specific device etc.) → https://openmp.org/specifications/ TR11 (see Appendix B.2 for the release notes, esp. for Section 6.2). I think it makes sense to keep those addition in mind when doing the actual implementation to avoid incompatibilities. Side note regarding ompx_ additions proposed in https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597979.html (adds ompx_pinned_mem_alloc), https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597983.html (ompx_unified_shared_mem_alloc and ompx_host_mem_alloc; ompx_unified_shared_mem_space and ompx_host_mem_space). While TR11 does not add any predefined allocators or new memory spaces, using e.g. omp_get_devices_all_allocator(memspace) returns a unified-shared-memory allocator. I note that LLVM does not seem to have any ompx_ in this regard (yet?). (It has some ompx_ – but related to assumptions.) Using Cuda might be trickier to implement because there's a layering violation inherent in routing target independent allocations through the nvptx plugin, but benchmarking shows that that's the only way to get the faster path through the Cuda black box; being pinned is good because it avoids page faults, but apparently if Cuda *knows* it is pinned then you get a speed boost even when there would be *no* faults (i.e. on a quiet machine). Additionally, Cuda somehow ignores the OS-defining limits. I wonder whether for a NUMA machine (and non-offloading access), using memkind_create_fixed will have an advantage over cuHostAlloc or not. (BTW, I find cuHostAlloc vs. cuAllocHost confusing.) And if so, whether we should provide a means (GOMP_... env var?) to toggle the preference. My feeling is that, on most systems, it does not matter - except (a) possibly for large NUMA systems, where the memkind tuning will probably make a difference and (b) we know that CUDA's cu(HostAlloc/AllocHost) is faster with nvptx offloading. (cu(HostAlloc/AllocHost) also permits DMA from the device. (If unified-shared address is supported, but that's the case [cf. comment + assert in plugin-nvptx.c].) Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH 2/2] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)
Hi Julian: On 07.12.22 20:13, Julian Brown wrote: I know that this was the case before, but can you move the mark:1 etc. after 'tlink'? In that case all bitfields are grouped together. Thanks for doing so. I wonder whether that also rejects the following – which seems to be valid. The 'map' goes to 'target' and the 'firstprivate' to 'parallel', cf. OpenMP 5.2, "17.2 Clauses on Combined and Composite Constructs", [340:3-4 & 12-14]. (BTW: While some fixes went into 5.1 regarding this section, a likewise wording is already in 5.0.) (Testing showed: it give an ICE without the patch and an error with.) ...and this patch avoids the error for combined directives, and reorders the gfc_symbol bitfields. All in all, I am fine with the patch - but I spotted some issues. First, I think you need to set for some error cases mark = 0 to avoid duplicated errors. Namely: ! Outputs the error twice ('Symbol ‘y’ present on multiple clauses') !$omp target has_device_addr(y) firstprivate(y) block; end block * * * Additionally, I think it would be good to have besides 'target' + map/firstprivate (→ error) also a testcase for 'target simd' + map/firstprivate → error And I think also gives-no-error checks all combined 'target ...' that take firstprivate should be added, cf. your own patch - possibly with looking at the original dump (scan-tree-dump) to see that the clause is properly attached correctly. Example for 'target teams': !$omp target teams map(x) firstprivate(x) block; end block (Works but no testcase.) * * * The following is not diagnosed and gives an ICE: !$omp target in_reduction(+: x) private(x) block; end block end The C testcase properly has: error: ‘x’ appears more than once in data-sharing clauses Note: Using 'firstprivate' instead of 'private' shows the proper error also in Fortran. The following does not ICE but does not make sense (and is rejected in C): 4 | #pragma omp target private(x) map(x) vs. !$omp target map(x) private(x) block; end block (The latter produces "#pragma omp target private(x.0) map(tofrom:*x.0)", ups!) * * * I also note that 'simd' accepts private such that #pragma omp target simd private(x) map(x) for (int i=0; i < 0; i++) ; !$omp target simd map(x) private(x) do i = 1, 0; end do is valid. (It is accepted by gcc and gfortran, i.e. it just needs to be added as testcase.) * * * I note that C rejects {map(x),firstprivate(x)} + {has_device_addr(x),is_device_ptr(x)}', but gfortran + your patch accepts: !$omp target map(x) has_device_addr(x) !$omp target map(x) is_device_ptr(x) while !$omp target firstprivate(x) has_device_addr(x) !$omp target firstprivate(x) is_device_ptr(x) is rejected – showing the error message twice. Expected: I think it should show an error in all four cases - but only once. 2022-12-06 Julian Brown gcc/fortran/ PR fortran/107214 * gfortran.h (gfc_symbol): Add data_mark, dev_mark, gen_mark and reduc_mark bitfields. * openmp.cc (resolve_omp_clauses): Use above bitfields to improve duplicate clause detection. gcc/testsuite/ PR fortran/107214 * gfortran.dg/gomp/pr107214.f90: New test. Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH 1/2] OpenMP/Fortran: Combined directives with map/firstprivate of same symbol
On 07.12.22 20:09, Julian Brown wrote: On Wed, 26 Oct 2022 12:39:39 +0200 Tobias Burnus wrote: The ICE seems to be because gcc/fortran/trans-openmp.cc's gfc_split_omp_clauses mishandles this as the dump shows the following: #pragma omp target firstprivate(a) map(tofrom:a) #pragma omp parallel firstprivate(a) In contrast, for the C testcase: #pragma omp target parallel for simd map(x) firstprivate(x) the dump is as follows, which seems to be sensible: #pragma omp target map(tofrom:x) #pragma omp parallel firstprivate(x) This patch fixes a case where a combined directive (e.g. "!$omp target parallel ...") contains both a map and a firstprivate clause for the same variable. When the combined directive is split into two nested directives, the outer "target" gets the "map" clause, and the inner "parallel" gets the "firstprivate" clause, like so: ... This is not a recent regression, but appears to fix a long-standing ICE. ... gcc/fortran/ * trans-openmp.cc (gfc_add_firstprivate_if_unmapped): New function. (gfc_split_omp_clauses): Call above. libgomp/ * testsuite/libgomp.fortran/combined-directive-splitting-1.f90: New test. LGTM – thanks! Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH v5 3/4] OpenMP: Pointers and member mappings
Hi Julian, I think this patch is OK; however, at least for gimplify.cc Jakub needs to have a second look. As remarked for the 2/4 patch, I believe mapping 'map(tofrom: var%f(2:3))' should work without explicitly mapping 'map(tofrom: var%f)' (→ [TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0 320:22-27]). → https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html (+ previously in the thread). Testing the patch, that seems to work fine (i.e. contrary to C/C++, cf. 2/4), which matches the dump and, if I understood correctly, also your (Julian's) expectation. Thus, no need to modify the code part. Regarding the testcases: * I would prefer if you don't modify the existing libgomp.fortran/struct-elem-map-1.f90 testcase; However, you could add your version as another variant ('subroutine nine()', 'four_var()' or what's the next free name, possibly with a comment telling that it is 'four()' but with an added explicit basepointer mapping.). * As the new version should map *less*, I wonder whether some -fdump-tree-{original,gimple,omplower} scan-dump-tree checks would be useful besides testing whether it works at run time. (Your decision regarding which tree, which testcases and whether at all.) * Likewise, maybe a 'target enter/exit data' check? However, you might very well run into my 'omp target data exit' issue, cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html (needs to be revised based on Jakub's comments; I think those were on IRC only – the problem is that not only 'alloc' is affected but also 'from' etc.) On 18.10.22 12:39, Julian Brown wrote: Implementing the "omp declare mapper" functionality, I noticed some cases where handling of derived type members that are pointers doesn't seem to be quite right. At present, a type such as this: ... map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8)) and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"): ... 2) map(tofrom: tvar%arrptr(3:8) --> GOMP_MAP_TOFROM *tvar%arrptr%data(3) (size 8-3+1, etc.) GOMP_MAP_TO_PSETtvar%arrptr GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.) ... Additionally, the next patch in the series adds a runtime diagnostic for the (illegal) case where 'i' and 'j' are different. 2022-10-18 Julian Brown gcc/fortran/ * dependency.cc (gfc_omp_expr_prefix_same): New function. * dependency.h (gfc_omp_expr_prefix_same): Add prototype. * gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2" union. * trans-openmp.cc (dependency.h): Include. (gfc_trans_omp_array_section): Use GOMP_MAP_TO_PSET unconditionally for mapping array descriptors. (gfc_symbol_rooted_namelist): New function. (gfc_trans_omp_clauses): Check subcomponent and subarray/element accesses elsewhere in the clause list for pointers to derived types or array descriptors, and adjust or drop mapping nodes appropriately. gcc/ * gimplify.cc (omp_tsort_mapping_groups): Process nodes that have OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. (omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling. Remove GOMP_MAP_ALWAYS_POINTER handling. libgomp/ * testsuite/libgomp.fortran/map-subarray.f90: New test. * testsuite/libgomp.fortran/map-subarray-2.f90: New test. * testsuite/libgomp.fortran/map-subarray-3.f90: New test. * testsuite/libgomp.fortran/map-subarray-4.f90: New test. * testsuite/libgomp.fortran/map-subarray-6.f90: New test. * testsuite/libgomp.fortran/map-subarray-7.f90: New test. * testsuite/libgomp.fortran/map-subcomponents.f90: New test. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for descriptor-mapping changes. Remove XFAIL. ... --- a/libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90 @@ -229,7 +229,8 @@ contains ! !$omp target map(tofrom: var%d(4:7), var%f(2:3), var%str2(2:3)) & ! !$omp& map(tofrom: var%str4(2:2), var%uni2(2:3), var%uni4(2:2)) -!$omp target map(tofrom: var%d(4:7), var%f(2:3), var%str2(2:3), var%uni2(2:3)) +!$omp target map(to: var%f) map(tofrom: var%d(4:7), var%f(2:3), & +!$omp& var%str2(2:3), var%uni2(2:3)) This adds 'to: var%f' (to the existing 'var%f(2:3)') – where 'f' is a POINTER. As discussed at the top, I prefer to leave it as is – and possibly just add another test-function, replicating this function and only there adding the basepointer as additional list item. -!$omp target map(tofrom: var%f(2:3)) +!$omp target map(to: var%f) map(tofrom: var%f(2:3)) likewise. -!$omp target map(tofrom: var%d(5), var%f(3), var%str2(3), var%uni2(3)) +!$omp target map(to: var%f) map(tofrom: var%d(5), var%f(3), & +!$omp& var%str2(3), var%uni2(3)) likewise. -!$omp target map(tofrom:
Re: [PATCH v5 2/4] OpenMP/OpenACC: Rework clause expansion and nested struct handling
Hi Julian, On 07.12.22 16:16, Julian Brown wrote: On Wed, 7 Dec 2022 15:54:42 +0100 Tobias Burnus wrote: If I understand Deepak's comment (on OpenMP.org's omp-lang list, sorry it is a nonpublic list) correctly, the following wording implies that a 'from: s.w[z:4]' for a pointer 's.w' also implies a mapping of 's.w' - if 's' is used inside the target region and, thus, gets implicitly mapped. [TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0 320:22-27]) "If a list item with an implicit data-mapping attribute does not have any corresponding storage in the device data environment prior to a task encountering the construct associated with the map clause, and one or more contiguous parts of the original storage are either list items or base pointers to list items that are explicitly mapped on the construct, only those parts of the original storage will have corresponding storage in the device data environment as a result of the map clauses on the construct." Hmmm... IIRC that is a different conclusion than the one we have understood previously, leading to e.g. the patch here (Chung-Lin CC'ed): https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html This seems to be the "Target directive struct mapping question" omp-lang thread, started on 2021-03-22. I think we need to distinguish: #pragma omp target enter data map(to: s.w[:10]) from #pragma omp target map(tofrom: s.arr[:20]) s.arr[0] = 5; As in the latter case 's' gets implicitly mapped and then applies to the base pointer 's.arr' of 's.arr[:20]'. While in the former case, only the pointee gets mapped without the pointer 's.arr' (and, hence, there is also no pointer attachment). At least that's what I get from the wording above and reading Deepak's last email - and it does not seem to clash with the discussion in the lengthy omp-lang thread. (Maybe there are other threads – or I completely misread them.) I think it makes sense to have a clarifying example in OpenMP; hence, I filed the OpenMP.org example issue #342, starting with essentially what I wrote above: 'target enter data' needs more work to get the pointer handling done, 'target' + accessing 's' works as is. I hope it makes sense. Follow-on discussion then questioned whether the change was really the intention of the spec, but we thought it was. Has that changed now? No idea – I find it difficult to track all the language changes and find mapping complex and unclear. However, it does seem to make sense in the way written above without contradicting to all previous discussions, minus the common confusion. (As least as I gathered from browsing both omp-lang and gcc-patches.) (I think actually changing the behaviour is a matter of flipping a switch, but let's make sure we choose the right setting!) That sounds great! Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH v5 2/4] OpenMP/OpenACC: Rework clause expansion and nested struct handling
Hi Julian, If I understand Deepak's comment (on OpenMP.org's omp-lang list, sorry it is a nonpublic list) correctly, the following wording implies that a 'from: s.w[z:4]' for a pointer 's.w' also implies a mapping of 's.w' - if 's' is used inside the target region and, thus, gets implicitly mapped. [TR11 157:21-26] (approx. [5.2 154:22-27], [5.1 352:17-22], [5.0 320:22-27]) "If a list item with an implicit data-mapping attribute does not have any corresponding storage in the device data environment prior to a task encountering the construct associated with the map clause, and one or more contiguous parts of the original storage are either list items or base pointers to list items that are explicitly mapped on the construct, only those parts of the original storage will have corresponding storage in the device data environment as a result of the map clauses on the construct." Thus, the following change should not be required – but if I undo it, I see a libgomp runtime error. Hence, it looks as if you need to fix this: On 18.10.22 12:39, Julian Brown wrote: --- a/libgomp/testsuite/libgomp.c/target-22.c +++ b/libgomp/testsuite/libgomp.c/target-22.c @@ -21,7 +21,8 @@ main () s.v.b = a + 16; s.w = c + 3; int err = 0; - #pragma omp target map (to:s.v.b[0:z + 7], s.u[z + 1:z + 4]) \ + #pragma omp target map (to: s.w, s.v.b, s.u, s.s) \ + map (to:s.v.b[0:z + 7], s.u[z + 1:z + 4]) \ map (tofrom:s.s[3:3]) \ map (from: s.w[z:4], err) private (i) Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads)
On 06.12.22 08:45, Tobias Burnus wrote: * As follow-up, libgomp.texi must be updated That is what the attached patch does – obviously, it is depending on the main patch. OK (once the main patch is in)? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp.texi: Reverse-offload updates libgomp/ * libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'. (GCN): Add item about 'omp requires'. (nvptx): Likewise; add item about reverse offload. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index efa7d956a33..e9ab079ecf5 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported. env variable @tab Y @tab @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab @item @code{requires} directive @tab P - @tab complete but no non-host devices provides @code{unified_address}, - @code{unified_shared_memory} or @code{reverse_offload} + @tab complete but no non-host devices provides @code{unified_address} or + @code{unified_shared_memory} @item @code{teams} construct outside an enclosing target region @tab Y @tab @item Non-rectangular loop nests @tab Y @tab @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab @@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported. @item @code{allocate} clause @tab P @tab Initial support @item @code{use_device_addr} clause on @code{target data} @tab Y @tab @item @code{ancestor} modifier on @code{device} clause - @tab Y @tab See comment for @code{requires} + @tab Y @tab Host fallback with GCN devices @item Implicit declare target directive @tab Y @tab @item Discontiguous array section with @code{target update} construct @tab N @tab @@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported. @code{append_args} @tab N @tab @item @code{dispatch} construct @tab N @tab @item device-specific ICV settings with environment variables @tab Y @tab -@item @code{assume} directive @tab Y @tab +@item @code{assume} and @code{assumes} directives @tab Y @tab @item @code{nothing} directive @tab Y @tab @item @code{error} directive @tab Y @tab @item @code{masked} construct @tab Y @tab @@ -4455,6 +4455,9 @@ The implementation remark: @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported using the C library @code{printf} functions and the Fortran @code{print}/@code{write} statements. +@item OpenMP code that has a requires directive with @code{unified_address}, + @code{unified_shared_memory} or @code{reverse_offload} will remove + any GCN device from the list of available devices (``host fallback''). @end itemize @@ -4504,6 +4507,13 @@ The implementation remark: @item Compilation OpenMP code that contains @code{requires reverse_offload} requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30} is not supported. +@item For code containing reverse offload (i.e. @code{target} regions with + @code{device(ancestor:1)}), there is a slight performance penality + for @emph{all} target regions, consisting mostly of shutdown delay + between zero to one microsecond and a tiny device querying overhead. +@item OpenMP code that has a requires directive with @code{unified_address} + or @code{unified_shared_memory} will remove any nvptx device from the + list of available devices (``host fallback''). @end itemize
Re: [wwwdocs] gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update
On 06.12.22 10:15, Jakub Jelinek wrote: On Tue, Dec 06, 2022 at 09:59:17AM +0100, Tobias Burnus wrote: This patch updates the OpenMP implementation status, based on libgomp.texi. For the release notes, it also moves 'non-rectangular loop nests' up as that's a 5.0 not a 5.1 feature. And in line with libgomp.texi, it adds to projects/gomp/ the items for TR11, a OpenMP 6.0 preview. (Hence, the id="omp6.0" to have a fixed id even when the list is updated to TR12 and later OpenMP 6.0.) The posted patch is certainly good, but doesn't do what you wrote above. Next try – how about this one? Tobias PS: There will be surely more updates before GCC 13 is released; I hope/assume the next change will be for nvptx reverse offload... - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update htdocs/gcc-13/changes.html | 21 ++-- htdocs/projects/gomp/index.html | 227 2 files changed, 223 insertions(+), 25 deletions(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 689178f9..59cb7a8d 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -46,14 +46,15 @@ a work-in-progress. General Improvements - https://gcc.gnu.org/projects/gomp/;>OpenMP + https://gcc.gnu.org/projects/gomp/;>OpenMP Reverse offload is now supported and the all clauses to the requires directive are now accepted; however, the requires_offload, unified_address and unified_shared_memory clauses cause that the - only available device is the initial device (the host). + only available device is the initial device (the host). Fortran now + supports non-rectangular loop nests, which were added for C/C++ in GCC 11. The following OpenMP 5.1 features have been added: the @@ -62,9 +63,10 @@ a work-in-progress. clause for the taskwait directive and the omp_target_is_accessible, omp_target_memcpy_async, omp_target_memcpy_rect_async and - omp_get_mapped_ptr API routines. Fortran now supports - non-rectangular loop nests, which were added for C/C++ in GCC 11. - + omp_get_mapped_ptr API routines. The assume and assumes + directives, the begin/end declare target syntax in C/C++ + and device-specific ICV settings with environment variables are now + supported. Initial support for OpenMP 5.2 features have been added: Support for firstprivate and allocate clauses on the @@ -73,7 +75,14 @@ a work-in-progress. omp_initial_device and omp_invalid_device; and optionally omitting the map-type in target enter/exit data. The enter clause (as alias for to) has been added - to the declare target directive. + to the declare target directive. Also added has been the + omp_in_explicit_task routine and the doacross + clause as alias for depend with source/sink + modifier. + + + The _ALL suffix to the device-scope environment variables, + added in Technical Report (TR11) is already handled. For user defined allocators requesting high bandwidth or large capacity diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 87903289..114bcde6 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -28,7 +28,8 @@ OpenMP and OpenACC are supported with GCC's C, C++ and Fortran compilers. 2.5 · 3.0 · 3.1 · 4.0 · 4.5 · 5.0 · - 5.1 · 5.2 + 5.1 · 5.2 · + TR 11 OpenMP Releases and Status @@ -620,6 +621,16 @@ than listed, depending on resolved corner cases and optimizations. GCC12 + +device-specific ICV settings with environment variables +GCC13 + + + +assume directive +GCC13 + + inoutset argument to the depend clause GCC13 @@ -650,6 +661,11 @@ than listed, depending on resolved corner cases and optimizations. GCC13 + +Support begin/end declare target syntax in C/C++ +GCC13 + + target_device trait in OpenMP Context No @@ -675,16 +691,6 @@ than listed, depending on resolved corner cases and optimizations. No - -device-specific ICV settings with environment variables -GCC13 - - - -assume directive -No - - Loop transformation constructs No @@ -727,27 +733,28 @@ than listed, depending on resolved corner cases and optimizations. -ompt_sync_region_t enum additions +For Fortran, diagnose placing declarative before/between USE, + IMPORT, and IMPLICIT as invalid No -ompt_state_t enum: ompt_state_wait_barrie
[wwwdocs] gcc-13/changes.html + projects/gomp: OpenMP GCC 13 update
This patch updates the OpenMP implementation status, based on libgomp.texi. For the release notes, it also moves 'non-rectangular loop nests' up as that's a 5.0 not a 5.1 feature. And in line with libgomp.texi, it adds to projects/gomp/ the items for TR11, a OpenMP 6.0 preview. (Hence, the id="omp6.0" to have a fixed id even when the list is updated to TR12 and later OpenMP 6.0.) Comments? Suggestions? OK? Tobias PS: There will be surely more updates before GCC 13 is released; I hope/assume the next change will be for nvptx reverse offload... - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 9f80367e539839fff1df2c85fc2640638199fc9e Author: Tobias Burnus Date: Tue Dec 6 09:49:30 2022 +0100 libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item libgomp/ * libgomp.texi (OpenMP 5.2): Add missing 'the'. (TR11): Add missing '@tab N @tab'. --- libgomp/libgomp.texi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 4caac497506..efa7d956a33 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -406,7 +406,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @item @code{allocate} and @code{firstprivate} clauses on @code{scope} @tab Y @tab @item @code{ompt_callback_work} @tab N @tab -@item Default map-type for @code{map} clause in @code{target enter/exit data} +@item Default map-type for the @code{map} clause in @code{target enter/exit data} @tab Y @tab @item New @code{doacross} clause as alias for @code{depend} with @code{source}/@code{sink} modifier @tab Y @tab @@ -463,6 +463,7 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0. @item @code{access} allocator trait changes @tab N @tab @item Extension of @code{interop} operation of @code{append_args}, allowing all modifiers of the @code{init} clause + @tab N @tab @item @code{interop} clause to @code{dispatch} @tab N @tab @item @code{apply} code to loop-transforming constructs @tab N @tab @item @code{omp_curr_progress_width} identifier @tab N @tab
[committed] libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item
Found when updating the wwwdocs files. Committed as obvious as https://gcc.gnu.org/r13-4500 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 9f80367e539839fff1df2c85fc2640638199fc9e Author: Tobias Burnus Date: Tue Dec 6 09:49:30 2022 +0100 libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item libgomp/ * libgomp.texi (OpenMP 5.2): Add missing 'the'. (TR11): Add missing '@tab N @tab'. --- libgomp/libgomp.texi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 4caac497506..efa7d956a33 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -406,7 +406,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @item @code{allocate} and @code{firstprivate} clauses on @code{scope} @tab Y @tab @item @code{ompt_callback_work} @tab N @tab -@item Default map-type for @code{map} clause in @code{target enter/exit data} +@item Default map-type for the @code{map} clause in @code{target enter/exit data} @tab Y @tab @item New @code{doacross} clause as alias for @code{depend} with @code{source}/@code{sink} modifier @tab Y @tab @@ -463,6 +463,7 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0. @item @code{access} allocator trait changes @tab N @tab @item Extension of @code{interop} operation of @code{append_args}, allowing all modifiers of the @code{init} clause + @tab N @tab @item @code{interop} clause to @code{dispatch} @tab N @tab @item @code{apply} code to loop-transforming constructs @tab N @tab @item @code{omp_curr_progress_width} identifier @tab N @tab
[Patch] libgomp: Handle OpenMP's reverse offloads
This patch finally handles reverse offload. Due to the prep work, it essentially only adds content to libgomp/target.c's gomp_target_rev(), except that it additionally saves the reverse-offload-function table in gomp_load_image_to_device. In the comment to "[Patch] libgomp: Add reverse-offload splay tree", https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601368.html , it was suggested not to keep track of all the variable mappings and to reconstruct the mapping from the normal splay tree, which this patch does. (Albeit in the very slow walk-everything way. Given that reverse-offload target regions likely have only few map items and program should only use few reverse-offload regions and expect them not being fast, that might be okay.) Specification references: - For pointer attachment, I assume that the pointer is already fine on the host (if existed on the host before) and it does not need to get updated. I think the spec lacks a wording for this; cf. OpenMP Spec Issue #3424. - There are plans to permit 'nowait'. I think it wouldn't change anything except for not spin waiting for the result - and (only for shared memory), the argument lists (addr, kinds, sizes) need to be copied to have a sufficent life time. (To be implemented in future; cf. OpenMP Spec Pull Req. 3423 for Issue 2038.) * * * 32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however, nvptx and gcn requires -m64 on the device side and assume that the device pointers are representable on the host (i.e. all are 64bit). The new code tries to be in principle compatible with uint32_t pointers and uses uint64_t to represent it consistently. – The code should be mostly fine, except that one called function requires an array of void* and size_t. Instead of handling that case, I added some code to permit optimizing away the function content without offloading - and a run-time assert if it should ever happen that this function gets called on a 32bit host from the target side. It is a run-time fail as '#if TARGET_OFFLOAD == ""' does not work (string comparison by the C preprocessor not supported, unfortunately). Comments, suggestions, OK for mainline, ... ? Tobias PS: * As follow-up, libgomp.texi must be updated * For GCN, it currently does not work until stack variables are accessible from the host. (Prep work for this is in newlib + GCC 13.) One done, a similar one-line change to plugin-gcn.c's GOMP_OFFLOAD_get_num_devices is required. PPS: (Off topic remark to 32bit host) While 32bit host with 32bit device will mostly work, having a 32bit host with a 64bit device becomes interesting as 'void *' returned by omp_target_alloc(...) can't represent a device pointer. The solution is a 32bit pointer pointing to a 64bit valirable, e.g. uint64_t *devptr = malloc(sizeof(uint64_t*); *devptr = internal_device_alloc (); return devptr; with all the fun to translate this correctly with {use,has}_device_ptr etc. To actually support this will require some larger changes to libgomp, which I do not see happening unless a device system with sizeof(void*) > 64 bit shows up. Or some compelling reason to use 32bit on the host; but not for for x86-64 or arm64 (or PowerPC). (There exist 128bit pointer systems, which use the upper bits for extra purposes - but for unified-shared address purposes, it seems to be unlikely that accelerator devices head this direction.) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Handle OpenMP's reverse offloads This commit enabled reverse offload for nvptx such that gomp_target_rev actually gets called. And it fills the latter function to do all of the following: finding the host function to the device func ptr and copying the arguments to the host, processing the mapping/firstprivate, calling the host function, copying back the data and freeing as needed. The data handling is made easier by assuming that all host variables either existed before (and are in the mapping) or that those are devices variables not yet available on the host. Thus, the reverse mapping can do without refcounts etc. Note that the spec disallows inside a target region device-affecting constructs other than target plus ancestor device-modifier and it also limits the clauses permitted on this construct. For the function addresses, an additional splay tree is used; for the lookup of mapped variables, the existing splay-tree is used. Unfortunately, its data structure requires a full walk of the tree; Additionally, the just mapped variables are recorded in a separate data structure an extra lookup. While the lookup is slow, assuming that only few variables get mapped in each reverse offload construct and that reverse offload is the exception and not performance critical, this
Re: [Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)
On 30.11.22 10:43, Andrew Stubbs wrote: On 29/11/2022 18:26, Tobias Burnus wrote: On 29.11.22 16:56, Paul-Antoine Arras wrote: This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, [...] PA committed that patch as https://gcc.gnu.org/r13-4403-g1fd508744eccda9ad9c6d6fcce5b2ea9c568818d (thanks!) I think this should be documented somewhere. We have https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html The wording is a little odd. How about "Additionally, gfx908 is supported as an alias for fiji"? Committed with the suggested wording: https://gcc.gnu.org/r13-4404-ge0b95c2e8b771b53876321a6a0a9497619af73cd Thanks, Tobias PS: It does not help with finding a good wording if that's the last task before calling it a day... - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)
Hi PA, hi Andrew, hi Jakub, hi all, On 29.11.22 16:56, Paul-Antoine Arras wrote: This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, [...] I think this should be documented somewhere. We have https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html For GCN and ISA, it refers to -march= and gfx803 is only a context selector. Hence: How about the attached patch? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors libgomp/ChangeLog: * libgomp.texi (OpenMP Context Selectors): Add 'gfx803' to gcn's isa. diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 85cae742cd4..0066d41fdc5 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -4378,5 +4378,6 @@ offloading devices (it's not clear if they should be): @item @code{amdgcn}, @code{gcn} @tab @code{gpu} - @tab See @code{-march=} in ``AMD GCN Options'' + @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally + supported is @code{gfx803} as an alias for @code{fiji}.} @item @code{nvptx} @tab @code{gpu}
[Patch] gcn: Fix __builtin_gcn_first_call_this_thread_p
It turned out that cprop cleverly propagated the unspec_volatile to the preceding (pseudo)register, permitting to remove the 'set (s0) (pseudoregister)' at -O2. Unfortunately, it does matter whether the assignment is done to 's2' (previously: pseudoregister) or to s1. – Just having a hard register is not enough ... Solution: Use USE (alias gen_rtx_USE) instead. Additionally, I removed the s0 modification (that should lead to the unchanged result) by adding 'gcn_operand_part (DImode, reg, 1)' and then working with SImode. Result: if (__builtin_gcn_first_call_this_thread_p()) x = 42; becomes now (with -O2) the following; the builtin code is up to to (and including) '.L2', the rest is the 'if' and 'x=42': s_lshr_b32 s2, s1, 16 s_cmpk_lg_u32 s2, 12345 s_mov_b32 s12, scc s_mov_b32 vcc_lo, scc s_mov_b32 vcc_hi, 0 s_cbranch_vccz .L2 s_and_b32 s2, s1, 65535 (= 0x) s_or_b32s1, s2, 809041920 (= 0x3039 = (12345 << 16)) .L2: s_getpc_b64 s[2:3] s_add_u32 s2, s2, x@rel32@lo+4 s_addc_u32 s3, s3, x@rel32@hi+4 s_mov_b32 vcc_lo, s12 s_mov_b32 vcc_hi, 0 s_cbranch_vccz .L3 s_mov_b32 s12, 42 v_writelane_b32 v0, s12, 0 s_mov_b64 exec, 1 global_store_dword v1, v0, s[2:3] .L3: OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcn: Fix __builtin_gcn_first_call_this_thread_p Contrary naive expectation, unspec_volatile (via prologue_use) did not prevent the cprop pass (at -O2) to remove the access to the s[0:1] (PRIVATE_SEGMENT_BUFFER_ARG) register as the volatile got just put on the preceeding pseudoregister. Solution: Use gen_rtx_USE instead. Additionally, this patch removes (gen_)prologue_use_di as it is then no longer used. Finally, as we already do bit manipulation, instead of using the full 64bit side - and then just keeping the value of 's0', just move directly to use only s1 of s[0:1] and do the bit manipulations there, generating more readable assembly code and better matching the '#else' branch. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Work on s1 instead of s[0:1] and use USE to prevent removal of setting that register. * config/gcn/gcn.md (prologue_use_di): Remove. gcc/config/gcn/gcn.cc | 16 gcc/config/gcn/gcn.md | 13 - 2 files changed, 8 insertions(+), 21 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 6fb261318c4..c74fa007a21 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4556,8 +4556,9 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , rtx not_first = gen_label_rtx (); rtx reg = gen_rtx_REG (DImode, cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]); - rtx cmp = force_reg (DImode, - gen_rtx_LSHIFTRT (DImode, reg, GEN_INT (48))); + reg = gcn_operand_part (DImode, reg, 1); + rtx cmp = force_reg (SImode, + gen_rtx_LSHIFTRT (SImode, reg, GEN_INT (16))); emit_insn (gen_cstoresi4 (result, gen_rtx_NE (BImode, cmp, GEN_INT(12345)), cmp, GEN_INT(12345))); @@ -4565,12 +4566,11 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , const0_rtx), result)); emit_move_insn (reg, - force_reg (DImode, - gen_rtx_IOR (DImode, - gen_rtx_AND (DImode, reg, - GEN_INT (0xL)), - GEN_INT (12345L << 48; - emit_insn (gen_prologue_use (reg)); + force_reg (SImode, + gen_rtx_IOR (SImode, + gen_rtx_AND (SImode, reg, GEN_INT (0x)), + GEN_INT (12345L << 16; + emit_insn (gen_rtx_USE (VOIDmode, reg)); emit_label (not_first); } return result; diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index a8b9c28d115..92e9892c4f7 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -697,19 +697,6 @@ "" [(set_attr "length" "0")]) -(define_insn_and_split "prologue_use_di" - [(unspec_volatile [(match_operand:DI 0 "register_operand")] UNSPECV_PROLOGUE_USE)] - "" - "#" - "reload_completed" - [(unspec_volatile [(match_dup 0)] UNSPECV_PROLOGUE_USE) - (unspec_volatile [(match_dup 1)] UNSPECV_PROLOGUE_USE)] - { -operands[1] = gcn_operand_part (DImode, operands[0], 1); -operands[0] = gcn_operand_part (DImode, operands[0], 0); - } - [(set_attr "length" "0")]) - (define_expand "prologue" [(const_int 0)] ""
Re: [Patch] OpenMP/Fortran: Permit end-clause on directive
Updated patch – taking the comments below into account – and the remark by Harald, second by Jakub. Namely: I have now split the pre-existing nowait-2.f90 into nowait-2.f90 (with only valid usage) and nowait-4.f90 (with the dg-error tests). In the previous version of the patch, nowait-4.f90 was a variant of nowait-2.f90 that used 'nowait' on the directive line. - And Harald suggested to split the latter, which I now did – into nowait-{5,6}.f90. Cf. Harald's email at https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600539.html and two emails by Jakub ("Otherwise LGTM"), first at https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601304.html + the next email in the thread. I intent to commit the attached patch tomorrow, unless there are further comments. Thanks for the reviews (and I know that the follow up is very belated)! Tobias On 08.09.22 17:21, Jakub Jelinek via Fortran wrote: On Fri, Aug 26, 2022 at 08:21:26PM +0200, Tobias Burnus wrote: I did run into some issues related to this; those turned out to be unrelated, but I end ended up implementing this feature. Side remark: 'omp parallel workshare' seems to actually permit 'nowait' now, but I guess that's an unintended change due to the syntax-representation change. Hence, it is now tracked as Spec Issue 3338 and I do not permit it. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: Permit end-clause on directive gcc/fortran/ChangeLog: * openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES, OMP_SECTIONS_CLAUSES, OMP_SINGLE_CLAUSES): Add 'nowait'. This doesn't describe what the patch actually does, Add 'nowait'. is only true for the first 3, for OMP_SINGLE_CLAUSES IMHO you want a separate (OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'. entry. @@ -3855,7 +3857,7 @@ cleanup: | OMP_CLAUSE_ORDER | OMP_CLAUSE_ALLOCATE) #define OMP_SINGLE_CLAUSES \ (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE \ - | OMP_CLAUSE_ALLOCATE) + | OMP_CLAUSE_ALLOCATE | OMP_CLAUSE_NOWAIT | OMP_CLAUSE_COPYPRIVATE) #define OMP_ORDERED_CLAUSES \ (omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD) #define OMP_DECLARE_TARGET_CLAUSES \ @@ -5909,13 +5915,11 @@ gfc_match_omp_teams_distribute_simd (void) match gfc_match_omp_workshare (void) { - if (gfc_match_omp_eos () != MATCH_YES) -{ - gfc_error ("Unexpected junk after $OMP WORKSHARE statement at %C"); - return MATCH_ERROR; -} + gfc_omp_clauses *c; + if (gfc_match_omp_clauses (, omp_mask (OMP_CLAUSE_NOWAIT)) != MATCH_YES) +return MATCH_ERROR; new_st.op = EXEC_OMP_WORKSHARE; - new_st.ext.omp_clauses = gfc_get_omp_clauses (); + new_st.ext.omp_clauses = c; return MATCH_YES; } I think it would be better to introduce OMP_WORKSHARE_CLAUSES and use it in both gfc_match_omp_workshare and just use return match_omp (EXEC_OMP_WORKSHARE, OMP_WORKSHARE_CLAUSES); ? @@ -6954,6 +6952,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, } break; case OMP_LIST_COPYPRIVATE: +if (omp_clauses->nowait) + gfc_error ("NOWAIT clause must not be be used with COPYPRIVATE " s/be be/be/ + "clause at %L", >where); for (; n != NULL; n = n->next) { if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE) @@ -5284,7 +5285,13 @@ parse_omp_do (gfc_statement omp_st) if (st == omp_end_st) { if (new_st.op == EXEC_OMP_END_NOWAIT) -cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool; +{ + if (cp->ext.omp_clauses->nowait && new_st.ext.omp_bool) +gfc_error_now ("Duplicated NOWAIT clause on %s and %s at %C", + gfc_ascii_statement (omp_st), + gfc_ascii_statement (omp_end_st)); + cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool; +} else gcc_assert (new_st.op == EXEC_NOP); gfc_clear_new_st (); Not sure if the standard is clear enough that unique clauses can't be repeated on both directive and corresponding end directive. But let's assume that is the case. --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/copyprivate-2.f90 @@ -0,0 +1,69 @@ + FUNCTION t() +INTEGER :: a, b, t +a = 0 +t = b +b = 0 +!$OMP PARALLEL REDUCTION(+:b) + !$OMP SINGLE COPYPRIVATE (b) NOWAIT ! { dg-error "NOWAIT clause must not be be used with COPYPRIVATE clause" } Here too (several times). +!$OMP ATOMIC WRITE +b = 6 + !$OMP END SINGLE +!$OMP END PARALLEL +t = t + b + END FUNCTION + + FUNCTION t2() +INTEGER :: a, b, t2 +a
Re: [Patch] libgomp.texi: OpenMP Impl Status 5.1 additions + TR11
On 25.11.22 11:38, Jakub Jelinek wrote: On Fri, Nov 25, 2022 at 11:34:35AM +0100, Tobias Burnus wrote: It also adds TR11. I don't think we will work any time soon on TR11 – possibly except for clarifications. OK for mainline? Ok (but I hope that once 6.0 is out, we just keep OpenMP 6.0 entries and don't mention any TRs). Yes, that was the idea to update it to/for TR12 next year and to/for 6.0 in two years. That also matches the spec itself, which gets replaced by newer TR and then by the final spec, keeping the old TR only in some more hidden links on the spec page. Pushed as https://gcc.gnu.org/r13-4301-gc16e85d726a7793c05209af031dac0bebf035ab9 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] libgomp.texi: OpenMP Impl Status 5.1 additions + TR11
Update libgomp.texi's OpenMP implementation status. The 5.1 changes are taken from Jakub's comment at https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602639.html (sorry for taking that long to incorporate those). It also adds TR11. I don't think we will work any time soon on TR11 – possibly except for clarifications. OK for mainline? Tobias PS: Albeit sometimes there is a fine border between clarification and larger new feature. For instance, * implicitly declared reduction identifiers for arbitrary C++ classes - or * how to handle implicit 'declare target' with declare variant and (no)host selectors. (The TR11 wording implies that the former is an old feature, while the latter is implied by the OpenMP 5.2 examples document, albeit an issue to clarify this in TR12 exists. For the latter: https://gcc.gnu.org/PR106316 + OpenMP Spec Issue 3416.) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp.texi: OpenMP Impl Status 5.1 additions + TR11 libgomp/ChangeLog: * libgomp.texi (OpenMP Implementation Status): Add three 5.1 items and status for Technical Report (TR) 11. libgomp/libgomp.texi | 68 1 file changed, 68 insertions(+) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 10fefa97922..584af45bd67 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -162,6 +162,7 @@ See also @ref{OpenMP Implementation Status}. * OpenMP 5.0:: Feature completion status to 5.0 specification * OpenMP 5.1:: Feature completion status to 5.1 specification * OpenMP 5.2:: Feature completion status to 5.2 specification +* OpenMP Technical Report 11:: Feature completion status to first 6.0 preview @end menu The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version} @@ -350,6 +351,9 @@ The OpenMP 4.5 specification is fully supported. to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @item For Fortran, diagnose placing declarative before/between @code{USE}, @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab +@item Optional comma beween directive and clause in the @code{#pragma} form @tab Y @tab +@item @code{indirect} clause in @code{declare target} @tab N @tab +@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab @end multitable @@ -425,6 +429,70 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @end multitable +@node OpenMP Technical Report 11 +@section OpenMP Technical Report 11 + +Technical Report (TR) 11 is the first preview for OpenMP 6.0. + +@unnumberedsubsec New features listed in Appendix B of the OpenMP specification +@multitable @columnfractions .60 .10 .25 +@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed + @tab N/A @tab Backward compatibility +@item The @code{decl} attribute was added to the C++ attribute syntax + @tab N @tab +@item @code{_ALL} suffix to the device-scope environment variables + @tab P @tab Host device number wrongly accepted +@item For Fortran, @emph{locator list} can be also function reference with + data pointer result @tab N @tab +@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr} + @tab N @tab +@item Implicit reduction identifiers of C++ classes + @tab N @tab +@item Change of the @emph{map-type} property from @emph{ultimate} to + @emph{default} @tab N @tab +@item Concept of @emph{assumed-size arrays} in C and C++ + @tab N @tab +@item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran + @tab N @tab +@item @code{groupprivate} directive @tab N @tab +@item @code{local} clause to declare target directive @tab N @tab +@item @code{part_size} allocator trait @tab N @tab +@item @code{pin_device}, @code{preferred_device} and @code{target_access} + allocator traits + @tab N @tab +@item @code{access} allocator trait changes @tab N @tab +@item Extension of @code{interop} operation of @code{append_args}, allowing all + modifiers of the @code{init} clause +@item @code{interop} clause to @code{dispatch} @tab N @tab +@item @code{apply} code to loop-transforming constructs @tab N @tab +@item @code{omp_curr_progress_width} identifier @tab N @tab +@item @code{safesync} clause to the @code{parallel} construct @tab N @tab +@item @code{omp_get_max_progress_width} runtime routine @tab N @tab +@item @code{strict} modifier keyword to @code{num_threads}, @code{num_tasks} + and @code{grainsize} @tab N @tab +@item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab +@item Routines for obtaining memory spaces/allocators for shared/device memory + @tab N @tab +@item @code{omp_get_memspace_num_resources} routine @tab N @tab +@item @code{omp_get_submemspace}
[Patch] libgomp: Add no-target-region rev offload test + fix plugin-nvptx
The nvptx reverse-offload code mishandled the case that there was a reverse offload function that isn't called inside a target region. In that case, the linker did not include GOMP_target_ext and the global variable it uses. But the plugin-nvptx.c code expected that the latter is present. Found via sollve_vv's tests/5.0/requires/test_requires_reverse_offload.c which is similar to the new testcase. (Albeit the 'if' and comments imply that the sollve_vv author did not intend this.) Solution: Handle it gracefully that the global variable does not exist - and do this check first - and only when successful allocate dev->rev_data. If not, deallocate rev_fn_table to disable reverse offload handling. OK for mainline? Tobias PS: Admittedly, the nvptx code is not yet exercised as I still have to submit the libgomp/target.c code handling the reverse offload (+ enabling requires reverse_offload in plugin-nvptx.c). As it is obvious from this patch, the target.c patch is nearly but not yet completely ready. - That patch passes the three sollve_vv testcases and also the existing libgomp testcases, but some corner cases and more testcases are missing. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Add no-target-region rev offload test + fix plugin-nvptx OpenMP permits that a 'target device(ancestor:1)' is called without being enclosed in a target region - using the current device (i.e. the host) in that case. This commit adds a testcase for this. In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal gracefully by disabling reverse offload and assuming that the failure is fine. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR as valid and the code having no reverse-offload code. * testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test. libgomp/plugin/plugin-nvptx.c | 36 ++-- .../libgomp.c-c++-common/reverse-offload-2.c | 49 ++ 2 files changed, 73 insertions(+), 12 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 0768fca350b..e803f083591 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1390,7 +1390,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, else if (rev_fn_table) { CUdeviceptr var; - size_t bytes, i; + size_t bytes; + unsigned int i; r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, , , module, "$offload_func_table"); if (r != CUDA_SUCCESS) @@ -1413,12 +1414,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, if (rev_fn_table && *rev_fn_table && dev->rev_data == NULL) { - /* cuMemHostAlloc memory is accessible on the device, if unified-shared - address is supported; this is assumed - see comment in - nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */ - CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data, - sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP); - CUdeviceptr dp = (CUdeviceptr) dev->rev_data; + /* Get the on-device GOMP_REV_OFFLOAD_VAR variable. It should be + available but it might be not. One reason could be: if the user code + has 'omp target device(ancestor:1)' in pure hostcode, GOMP_target_ext + is not called on the device and, hence, it and GOMP_REV_OFFLOAD_VAR + are not linked in. */ CUdeviceptr device_rev_offload_var; size_t device_rev_offload_size; CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, @@ -1426,11 +1426,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, _rev_offload_size, module, XSTRING (GOMP_REV_OFFLOAD_VAR)); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuModuleGetGlobal error - GOMP_REV_OFFLOAD_VAR: %s", cuda_error (r)); - r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, device_rev_offload_var, , - sizeof (dp)); - if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r)); + { + free (*rev_fn_table); + *rev_fn_table = NULL; + } + else + { + /* cuMemHostAlloc memory is accessible on the device, if + unified-shared address is supported; this is assumed - see comment + in nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */ + CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data, + sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP); + CUdeviceptr dp = (CUdeviceptr) dev->rev_data; + r =
OpenMP Patch Ping
Updated list as follow up to last ping at https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601162.html Recent patches: Sandra's (Tue Nov 15 04:46:15 GMT 2022) [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target" https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html Julian's patches - I hope I got it right as I lost a bit track: (Tue Nov 8 14:36:17 GMT 2022) [PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++) https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605367.html (Fri Sep 30 13:30:22 GMT 2022) [PATCH v3 06/11] OpenMP: Pointers and member mappings https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602609.html (Tue Oct 18 10:39:01 GMT 2022) [PATCH v5 0/4] OpenMP/OpenACC: Fortran array descriptor mappings https://gcc.gnu.org/pipermail/gcc-patches/2022-October/thread.html#603790 (I think this is partially my task to review those.) Approved but waiting for the Fortran patches (v5) to get approved. [PATCH v3 08/11] OpenMP/OpenACC: Rework clause expansion and nested struct handling https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602010.html Possibly requiring a second look/review despite my initial comment (which might require revisions on the patch side as well): OpenMP: Duplicate checking for map clauses in Fortran (PR107214) https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604033.html Older patches: * [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html * Unified-Shared Memory & Pinned Memory Depending on those: * [PATCH] OpenMP, libgomp: Handle unified shared memory in omp_target_is_accessible. https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594187.html * [PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599703.html (Fortran part, required for ...) * Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html And finally: * [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599332.html (Side remark: some other debugging support like showing the mapping being done as stderr output or ... would be nice as well; might depend on a libgomp-debug.so and/or -f...(sanitize=openmp or ...); the other open-source compiler has something similar.) * * * Pending libgomp/nvptx patches: (Wed Sep 21 07:45:36 GMT 2022) [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601922.html (Wed Sep 21 07:45:54 GMT 2022) [PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601925.html Those were pinged 4 times :-( Hopefully, I have not missed any patch Tobias PS: The following list covers pending patches, which have been reviewed but but need to updated before being ready - hopefully, this list is also up to date: * (No pending patch, but wwwdoc's changes-13.html + projects/gomp/ need an update before GCC 13) * [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices Should be re-submitted any time soon (today, next few days) * [Patch] OpenMP/Fortran: Permit end-clause on directive https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600433.html Trivial patch modifications required - mostly LGTM already. * [PATCH] libgomp: fix hang on fatal error https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html (Patch rejected but alternative solutions were suggested.) * Re: [Patch] OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays (Committed but failing occasionally:) https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605854.html * "[PATCH 3/3] vect: inbranch SIMD clones" https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599490.html Review comments to be addressed. * [PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0) https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html * [PATCH] openmp: fix max_vf setting for amdgcn offloading https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598265.html → To be updated for review comments. (Side note: we should at some point find a way to improve target-specific handling; similar to the are-exceptions-supported issue of PR101544 but there are more.) * [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html * Needs to be revised according to review comments * Fortran allocatable components handling (needs to be split into separate pieces and submitted separately)
*PING* - [wwwdocs] projects/gomp: TR11 + GCC13 update
On 11.11.22 16:13, Tobias Burnus wrote: This patch adds TR11 to the history of OpenMP releases – and it does an update of the implementation status. OK? Tobias PS: The implementation-status changes were lying around in that file for a while. I think both the GCC 13 release notes and this file needs some update for more recent changes. Nonetheless, while incomplete, the changes themselves should be fine. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}
On 19.11.22 11:46, Tobias Burnus wrote: + stacklimit = stackbase + seg_size*64; (this should be '*seg_size' not 'seg_size' and the name should be s/seg_size/seg_size_ptr/.) I have updated the comment and ... (Reading it, I think it should be '..._MEM(SImode,' and '..._MULT(SImode' instead of DImode.) Additionally, there was a problem of bytes vs. bits in: My understanding is that dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192) which is wrong - its 192 bits but only 24 bytes! Finally, in the first_call_this_thread_p() call, I mixed up EQ vs. NE at one place. BTW: It seems as if there is no problem with zero extension, if I look at the assembler result. Updated version. Consists of: GCC patch adding the builtins, the newlib patch using those (unchanged; used for testing + to be submitted), and a 'test.c' using the builtins and its dump produced with amdgcn's 'cc1 -O2' to show the resulting assembly. Tested with libgomp on gfx908 offloading and getting only the known fails: (libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90, libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c}) OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p} The new builtins have been added for newlib to reduce dependency on compiler-internal implementation choices of GCC in newlibs' getreent.c. gcc/ChangeLog: * config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P, GET_STACK_LIMIT): Add new builtins. * config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them. * config/gcn/gcn.md (prologue_use): Add "register_operand" as arg to match_operand. (prologue_use_di): New; DI insn_and_split variant of the former. Co-Authored-By: Andrew Stubbs gcc/config/gcn/gcn-builtins.def | 4 +++ gcc/config/gcn/gcn.cc | 70 - gcc/config/gcn/gcn.md | 15 - 3 files changed, 87 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def index eeeaebf9013..f1cf30bbc94 100644 --- a/gcc/config/gcn/gcn-builtins.def +++ b/gcc/config/gcn/gcn-builtins.def @@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID), /* Kernel inputs. */ +DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN, + _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1) DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1) +DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN, + _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1) #undef _A1 #undef _A2 diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index b3814c2e7c6..ea9631e8823 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4493,6 +4493,45 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , emit_insn (gen_gcn_wavefront_barrier ()); return target; +case GCN_BUILTIN_GET_STACK_LIMIT: + { + /* stackbase = (stack_segment_decr & 0x) + + stack_wave_offset); + seg_size = dispatch_ptr->private_segment_size; + stacklimit = stackbase + seg_size*64; + with segsize = *(uint32_t *) ((char *) dispatch_ptr + + 6*sizeof(int16_t) + 3*sizeof(int32_t)); + cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */ + rtx ptr; + if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0 + && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0) + { + rtx size_rtx = gen_rtx_REG (DImode, + cfun->machine->args.reg[DISPATCH_PTR_ARG]); + size_rtx = gen_rtx_MEM (SImode, +gen_rtx_PLUS (DImode, size_rtx, + GEN_INT (6*2 + 3*4))); + size_rtx = gen_rtx_MULT (SImode, size_rtx, GEN_INT (64)); + + ptr = gen_rtx_REG (DImode, + cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]); + ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x)); + ptr = gen_rtx_PLUS (DImode, ptr, size_rtx); + if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0) + { + rtx off; + off = gen_rtx_REG (SImode, + cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]); + ptr = gen_rtx_PLUS (DImode, ptr, off); + } + } + else + { + ptr = gen_reg_rtx (DImode); + emit_move_insn (ptr, const0_rtx); + } + return ptr; + } case GCN_BUILTIN_KERNARG_PTR: { rtx ptr; @@ -4506,7 +4545,36 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , } return ptr; } - +case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P: + { + /* Stas
[Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)
Working on the builtins, I realized that I mixed up (again) bits and byes. While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 128 bytes. Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only need 6. This patch now makes use of the available space, avoiding one device-to-host memory copy; additionally, it avoids a 32bit vs 64bit alignment issue which I somehow missed :-( Tested with libgomp on gfx908 offloading and getting only the known fails: (libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90, libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c}) OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp/gcn: fix/improve struct output output.printf_data.(value union) contains text[128], which has the size of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2] could be extended to value_u64[6] - sufficient for all required arguments to gomp_target_rev. Additionally, next_output.printf_data.(msg union) contained msg_u64 which then is no longer needed and also caused 32bit vs 64bit alignment issues. libgomp/ * config/gcn/libgomp-gcn.h (struct output): Remove 'msg_u64' from the union, change value_u64[2] to value_u64[6]. * config/gcn/target.c (GOMP_target_ext): Update accordingly. * plugin/plugin-gcn.c (process_reverse_offload, console_output): Likewise. libgomp/config/gcn/libgomp-gcn.h | 7 ++- libgomp/config/gcn/target.c | 12 ++-- libgomp/plugin/plugin-gcn.c | 17 +++-- 3 files changed, 15 insertions(+), 21 deletions(-) diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h index 91560be787f..3933e846a86 100644 --- a/libgomp/config/gcn/libgomp-gcn.h +++ b/libgomp/config/gcn/libgomp-gcn.h @@ -37,16 +37,13 @@ struct output unsigned int next_output; struct printf_data { int written; -union { - char msg[128]; - uint64_t msg_u64[2]; -}; +char msg[128]; int type; union { int64_t ivalue; double dvalue; char text[128]; - uint64_t value_u64[2]; + uint64_t value_u64[6]; }; } queue[1024]; unsigned int consumed; diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c index 27854565d40..11ae6ec9833 100644 --- a/libgomp/config/gcn/target.c +++ b/libgomp/config/gcn/target.c @@ -102,12 +102,12 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, asm ("s_sleep 64"); unsigned int slot = index % 1024; - uint64_t addrs_sizes_kind[3] = {(uint64_t) hostaddrs, (uint64_t) sizes, - (uint64_t) kinds}; - data->queue[slot].msg_u64[0] = (uint64_t) fn; - data->queue[slot].msg_u64[1] = (uint64_t) mapnum; - data->queue[slot].value_u64[0] = (uint64_t) _sizes_kind[0]; - data->queue[slot].value_u64[1] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num; + data->queue[slot].value_u64[0] = (uint64_t) fn; + data->queue[slot].value_u64[1] = (uint64_t) mapnum; + data->queue[slot].value_u64[2] = (uint64_t) hostaddrs; + data->queue[slot].value_u64[3] = (uint64_t) sizes; + data->queue[slot].value_u64[4] = (uint64_t) kinds; + data->queue[slot].value_u64[5] = (uint64_t) GOMP_ADDITIONAL_ICVS.device_num; data->queue[slot].type = 4; /* Reverse offload. */ __atomic_store_n (>queue[slot].written, 1, __ATOMIC_RELEASE); diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index ffe5cf5af2c..388e87b7765 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1919,16 +1919,12 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams) } static void -process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t rev_data, - uint64_t dev_num64) +process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t hostaddrs, + uint64_t sizes, uint64_t kinds, uint64_t dev_num64) { int dev_num = dev_num64; - uint64_t addrs_sizes_kinds[3]; - GOMP_OFFLOAD_host2dev (dev_num, _sizes_kinds, (void *) rev_data, - sizeof (addrs_sizes_kinds)); - GOMP_PLUGIN_target_rev (fn, mapnum, addrs_sizes_kinds[0], - addrs_sizes_kinds[1], addrs_sizes_kinds[2], - dev_num, NULL, NULL, NULL); + GOMP_PLUGIN_target_rev (fn, mapnum, hostaddrs, sizes, kinds, dev_num, + NULL, NULL, NULL); } /* Output any data written to console output from the kernel. It is expected @@ -1976,8 +1972,9 @@ console_output (struct kernel_info *kernel, struct kernargs *kernargs, case 2: printf ("%.128s%.128s\n", data->msg, data->text); break; case 3: printf ("%.128s%.128s", data->msg, data->text); break; case 4: - process_reverse_offload (data->msg_u64[0], data->msg_u64[1], - data->value_u64[0],data->value_u64[1]); + process_reverse_offload (data->value_u64[0],
Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}
On 18.11.22 18:49, Andrew Stubbs wrote: On 18/11/2022 17:20, Tobias Burnus wrote: This looks wrong: +/* stackbase = (stack_segment_decr & 0x) ++ stack_wave_offset); + seg_size = dispatch_ptr->private_segment_size; + stacklimit = stackbase + seg_size*64; (this should be '*seg_size' not 'seg_size' and the name should be s/seg_size/seg_size_ptr/.) + with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t); + cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */ +rtx ptr; +if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0 +&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0) + { +rtx size_rtx = gen_rtx_REG (DImode, + cfun->machine->args.reg[DISPATCH_PTR_ARG]); +size_rtx = gen_rtx_MEM (DImode, +gen_rtx_PLUS (DImode, size_rtx, + GEN_INT (6*16 + 3*32))); +size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64)); + (Reading it, I think it should be '..._MEM(SImode,' and '..._MULT(SImode' instead of DImode.) seg_size is calculated from the private_segment_size loaded from the dispatch_ptr, not calculated from the dispatch_ptr itself. Isn't this what thee code tries to do? Namely: My understanding is that dispatch_ptr->private_segment_size == *((char*)dispatch_ptr + 192) And the latter is what I attempt to do. I have a very limited knowledge of insn/rtx/RTL and of GCN assemply; thus, I likely have done something stupid. Having said this, Here is what I get: (Where asm("s4") == dispatch_ptr) s_add_u32 s2, s4, 192 s_addc_u32 s3, s5, 0 v_writelane_b32 v4, s2, 0 v_writelane_b32 v5, s3, 0 s_mov_b64 exec, 1 flat_load_dwordx2 v[4:5], v[4:5] s_waitcnt 0 v_lshlrev_b64 v[4:5], 6, v[4:5] v_readlane_b32 s2, v4, 0 v_readlane_b32 s3, v5, 0 Not that I really understand every line, but at a glance it looks okay. The 192 is because of (quoting newlib/libc/machine/amdgcn/getreent.c): typedef struct hsa_kernel_dispatch_packet_s { uint16_t header ; uint16_t setup; uint16_t workgroup_size_x ; uint16_t workgroup_size_y ; uint16_t workgroup_size_z; uint16_t reserved0; uint32_t grid_size_x ; uint32_t grid_size_y ; uint32_t grid_size_z; uint32_t private_segment_size; i.e. 6*16 + 3*32 = 192 – and we want to read a 32bit unsigned int. * * * Admittedly, there is probably something not quite right as I see with gfx908 # of expected passes27476 # of unexpected failures317 where 317 FAIL comes from 88 testcase files. That's not a a very high number but more than the usual fails, which shows that something is not quite right. * * * I am pretty sure that I missed something - but the question is what. I hope you can help me pinpoint the place where it goes wrong. Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling
Attached is the updated/rediffed version, which now uses the builtin instead of the 'asm("s8"). The code in principle works; that is: If no private stack variables are copied, it works. Or in other words: reverse-offload target regions that don't use firstprivate or mapping work, the rest would crash. That's avoided by not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now. To get it working, the manual stack allocation patch + the trivial update to that get_num_devices func is needed, but no change to the attached patch. In order to reduce local patches, I would love to have it on mainline – otherwise, I have at least the current version in gcc-patches@. Tobias PS: Previous patch email quoted below. Note: there were two follow up emails, one by Andrew and one by me; cf. your own mail archive (of this thread) or https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603383.html + the next two by thread messages. On 12.10.22 16:29, Tobias Burnus wrote: On 29.09.22 18:24, Andrew Stubbs wrote: On 27/09/2022 14:16, Tobias Burnus wrote: Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance regarding how to actually implement it. [...] The point is that you can use the "msg" and "text" fields for whatever data you want, as long as you invent a new value for "type". [] You can make "case 4" do whatever you want. There are enough bytes for 4 pointers, and you could use multiple packets (although it's not safe to assume they're contiguous or already arrived; maybe "case 4" for part 1, "case 5" for part 2). It's possible to change this structure, of course, but the target implementation is in newlib so versioning becomes a problem. I think – also looking at the Newlib write.c implementation - that the data is contiguous: there is an atomic add, where instead of passing '1' for a single slot, I could also add '2' for two slots. Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev, it needs the generic parts of the sister nvptx patch.* 2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + 32.) As target_ext is blocking, I decided to use a stack local variable for the remaining arguments and pass it along. Alternatively, I could also use 2 slots - and process them together. This would avoid one device->host memory copy but would make console_output less clear. OK for mainline? Tobias * https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html PS: Currently, device stack variables are private and cannot be accessed from the host; this will change in a separate patch. It not only affects the "rest" part as used in this patch but also the actual arrays behind addr, kinds, and sizes. And quite likely a lot of the map/firstprivate variables passed to addr. As num_devices() will return 0 or -1, this is for now a non-issue. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp/gcn: Prepare for reverse-offload callback handling libgomp/ChangeLog: * config/gcn/libgomp-gcn.h: New file; contains struct output, declared previously in plugin-gcn.c. * config/gcn/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare as extern var. (GOMP_target_ext): Handle reverse offload. * plugin/plugin-gcn.c: Include libgomp-gcn.h. (struct kernargs): Replace struct def by the one from libgomp-gcn.h for output_data. (process_reverse_offload): New. (console_output): Call it. libgomp/config/gcn/libgomp-gcn.h | 61 libgomp/config/gcn/target.c | 44 - libgomp/plugin/plugin-gcn.c | 34 -- 3 files changed, 117 insertions(+), 22 deletions(-) diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h new file mode 100644 index 000..91560be787f --- /dev/null +++ b/libgomp/config/gcn/libgomp-gcn.h @@ -0,0 +1,61 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Tobias Burnus . + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additio
[Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}
This patch adds two builtins (getting end-of-stack pointer and a Boolean answer whether it was the first call to the builtin on this thread). The idea is to replace some hard-coded values in newlib, permitting to move later to a manually allocated stack on the compiler side without the need to modify newlib again. The GCC patch matches what newlib did in reent; I could imagine that we change this later on. Lightly tested (especially by visual inspection). Currently doing a final regtest, OK when it passes? Any comments to this patch - or the attached newlib patch?* Tobias (*) I also included a patch to newlib to see where were are heading + to actually use them for regtesting ... - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p} The new builtins have been added for newlib to reduce dependency on compiler-internal implementation choices of GCC in newlibs' getreent.c. gcc/ChangeLog: * config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P, GET_STACK_LIMIT): Add new builtins. * config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them. * config/gcn/gcn.md (prologue_use): Add "register_operand" as arg to match_operand. (prologue_use_di): New; DI insn_and_split variant of the former. Co-Authored-By: Andrew Stubbs gcc/config/gcn/gcn-builtins.def | 4 +++ gcc/config/gcn/gcn.cc | 70 - gcc/config/gcn/gcn.md | 15 - 3 files changed, 87 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def index eeeaebf9013..f1cf30bbc94 100644 --- a/gcc/config/gcn/gcn-builtins.def +++ b/gcc/config/gcn/gcn-builtins.def @@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID), /* Kernel inputs. */ +DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN, + _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1) DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1) +DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN, + _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1) #undef _A1 #undef _A2 diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index b3814c2e7c6..051eadee783 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4493,6 +4493,44 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , emit_insn (gen_gcn_wavefront_barrier ()); return target; +case GCN_BUILTIN_GET_STACK_LIMIT: + { + /* stackbase = (stack_segment_decr & 0x) + + stack_wave_offset); + seg_size = dispatch_ptr->private_segment_size; + stacklimit = stackbase + seg_size*64; + with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t); + cf. struct hsa_kernel_dispatch_packet_s in the HSA doc. */ + rtx ptr; + if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0 + && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0) + { + rtx size_rtx = gen_rtx_REG (DImode, + cfun->machine->args.reg[DISPATCH_PTR_ARG]); + size_rtx = gen_rtx_MEM (DImode, +gen_rtx_PLUS (DImode, size_rtx, + GEN_INT (6*16 + 3*32))); + size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64)); + + ptr = gen_rtx_REG (DImode, + cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]); + ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x)); + ptr = gen_rtx_PLUS (DImode, ptr, size_rtx); + if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0) + { + rtx off; + off = gen_rtx_REG (SImode, + cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]); + ptr = gen_rtx_PLUS (DImode, ptr, off); + } + } + else + { + ptr = gen_reg_rtx (DImode); + emit_move_insn (ptr, const0_rtx); + } + return ptr; + } case GCN_BUILTIN_KERNARG_PTR: { rtx ptr; @@ -4506,7 +4544,37 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , } return ptr; } - +case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P: + { + /* Stash a marker in the unused upper 16 bits of s[0:1] to indicate + whether it was the first call. */ + rtx result = gen_reg_rtx (BImode); + emit_move_insn (result, const0_rtx); + if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0) + { + rtx not_first = gen_label_rtx (); + rtx reg = gen_rtx_REG (DImode, + cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]); + rtx cmp = force_reg (DImode, + gen_rtx_AND (DImode, reg, + GEN_INT (0xL))); + emit_insn (gen_cstoresi4 (result, gen_rtx_EQ (BImode, cmp, + GEN_INT(12345L << 48)), + cmp, GEN_INT(12345L << 48))); +
[patch] gcn: Add __builtin_gcn_kernarg_ptr
This is a part of a patch by Andrew (hi!) - namely that part that only adds the __builtin_gcn_kernarg_ptr. More is planned, see below. The short term benefit of this patch is to permit replacing hardcoded numbers by a builtin – like in libgomp (see patch) or in newlib (not submitted): --- a/newlib/libc/sys/amdgcn/write.c +++ b/newlib/libc/sys/amdgcn/write.c @@ -59,1 +59,5 @@ _READ_WRITE_RETURN_TYPE write (int fd, const void *buf, size_t count) +#if defined(__has_builtin) && __has_builtin(__builtin_gcn_kernarg_ptr) + register void **kernargs = __builtin_gcn_kernarg_ptr (); +#else register void **kernargs asm("s8"); +#endif It would also replace the 'asm("s8")' in reverse offload (GCN) patch, i.e. https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html However, this patch is only the very first step. Next one is to add several additional builtins, namely those that are required for newlib, i.e. newlib/libc/machine/amdgcn/mlock.c (sbrk) and newlib/libc/machine/amdgcn/getreent.c (__getreent) use some additional hard-coded value for heap and stack memory. And at some point - but only after newlib has been updated - we can think of making stack variables non-private. That's a general goal - and in any case required for reverse offload to be able to transfer between the host and on-device stack variables. * * * Regarding the patch: Besides the obvious change (addition of the builtin), the change to DEFAULT memory space is required to avoid a memory-space conversion ICE when using the new builtin. The gcn_oacc_dim_size change is mainly just picked from Andrew's patch as it seems to be reasonable. In terms of the libgomp testsuite, I did not spot anything except that the -O2 run now does no longer fail with "libgomp: target function wasn't mapped" for libgomp.oacc-fortran/kernels-map-1.f90 - but I am not sure it is related or not. In any case, the libgomp testsuite shows no fails (but the usual fails) with the attached patch. OK for mainline? Tobias PS: The plan is to have at least all builtins in GCC and use them in newlib by at the end of this year (i.e. in newlib's end of year snapshot - aka as annual release). PPS: I wonder whether [Patch] libgomp/gcn: Prepare for reverse-offload callback handling https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html would be okay after this patch - with the asm("s8") replaced by the builtin - or not. The code itself would be fine, but it is unreachable until GOMP_OFFLOAD_get_num_devices accepts reverse offload and the latter depends on the support for non-private stack variables. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcn: Add __builtin_gcn_kernarg_ptr Add __builtin_gcn_kernarg_ptr to avoid using hard-coded register values and permit future ABI changes while keeping the API. gcc/ChangeLog: * config/gcn/gcn-builtins.def (KERNARG_PTR): Add. * config/gcn/gcn.cc (gcn_init_builtin_types): Change siptr_type_node, sfptr_type_node and voidptr_type_node from FLAT to ADDR_SPACE_DEFAULT. (gcn_expand_builtin_1): Handle GCN_BUILTIN_KERNARG_PTR. (gcn_oacc_dim_size): Return in ADDR_SPACE_FLAT. libgomp/ChangeLog: * config/gcn/team.c (gomp_gcn_enter_kernel): Use __builtin_gcn_kernarg_ptr instead of asm ("s8"). Co-Authored-By: Andrew Stubbs gcc/config/gcn/gcn-builtins.def | 4 gcc/config/gcn/gcn.cc | 24 libgomp/config/gcn/team.c | 2 +- 3 files changed, 25 insertions(+), 5 deletions(-) diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def index c50777bd..eeeaebf 100644 --- a/gcc/config/gcn/gcn-builtins.def +++ b/gcc/config/gcn/gcn-builtins.def @@ -158,6 +158,10 @@ DEF_BUILTIN (ACC_SINGLE_COPY_END, -1, "single_copy_end", B_INSN, DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID), gcn_expand_builtin_1) +/* Kernel inputs. */ + +DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR), + gcn_expand_builtin_1) #undef _A1 #undef _A2 diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 5e6f3b8..b3814c2 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4058,15 +4058,15 @@ gcn_init_builtin_types (void) (integer_type_node) */ , 64); tree tmp = build_distinct_type_copy (intSI_type_node); - TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT; + TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT; siptr_type_node = build_pointer_type (tmp); tmp = build_distinct_type_copy (float_type_node); - TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT; + TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT; sfptr_type_node = build_pointer_type (tmp); tmp = build_distinct_type_copy (void_type_node); - TYPE_ADDR_SPACE (tmp) =
[Patch] nvptx/mkoffload.cc: Fix "$nohost" check
Found when working on real reverse offload - as the reverse-offload stub function was added to the reverse-offload table. Reason - as mentioned in the commit log: lhd_set_decl_assembler_name. I intent to commit it tomorrow as obvious, unless there are further comments. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 nvptx/mkoffload.cc: Fix "$nohost" check If lhd_set_decl_assembler_name is invoked - in particular if !TREE_PUBLIC (decl) && !DECL_FILE_SCOPE_P (decl) - the '.nohost' suffix might change to '.nohost.2'. This happens for the existing reverse offload testcases via cgraph_node::analyze and is a side effect of r13-3455-g178ac530fe67e4f2fc439cc4ce89bc19d571ca31 for some reason. The solution is to not only check for a tailing '$nohost' but also for '$nohost$' in nvptx/mkoffload.cc. gcc/ChangeLog: * config/nvptx/mkoffload.cc (process): Recognize '$nohost$...' besides tailing '$nohost' as being for reverse offload. gcc/config/nvptx/mkoffload.cc | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 854cd72f3c7..5d89ba8a788 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -364,7 +364,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) Alternatively, besides searching for 'BEGIN FUNCTION DECL', checking for '.visible .entry ' + id->ptx_name would be required. */ - if (!endswith (id->ptx_name, "$nohost")) + if (!endswith (id->ptx_name, "$nohost") + && !strstr (id->ptx_name, "$nohost$")) continue; fprintf (out, "\t\".extern "); const char *p = input + file_idx[fidx]; @@ -402,7 +403,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) "$offload_func_table[] = {"); for (comma = "", id = func_ids; id; comma = ",", id = id->next) fprintf (out, "%s\"\n\t\t\"%s", comma, - endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + (endswith (id->ptx_name, "$nohost") + || strstr (id->ptx_name, "$nohost$")) ? id->ptx_name : "0"); fprintf (out, "};\\n\";\n\n"); }
[wwwdocs] projects/gomp: TR11 + GCC13 update
This patch adds TR11 to the history of OpenMP releases – and it does an update of the implementation status. OK? Tobias PS: The implementation-status changes were lying around in that file for a while. I think both the GCC 13 release notes and this file needs some update for more recent changes. Nonetheless, while incomplete, the changes themselves should be fine. projects/gomp: TR11 + GCC13 update htdocs/projects/gomp/index.html | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 713a4e16..46f393c8 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -677,7 +677,7 @@ than listed, depending on resolved corner cases and optimizations. device-specific ICV settings with environment variables -No +GCC13 @@ -771,10 +771,10 @@ than listed, depending on resolved corner cases and optimizations. No - + omp/ompx/omx sentinels and omp_/ompx_ namespaces N/A - +warning for ompx/omx sentinels (1) Clauses on end directive can be on directive @@ -888,7 +888,7 @@ than listed, depending on resolved corner cases and optimizations. New doacross clause as alias for depend with source/sink modifier -No +GCC13 @@ -898,7 +898,7 @@ than listed, depending on resolved corner cases and optimizations. omp_cur_iteration keyword -No +GCC13 @@ -924,9 +924,22 @@ than listed, depending on resolved corner cases and optimizations. +(1) The +ompx sentinel as C/C++ pragma and C++ attributes are warned for +with -Wunknown-pragmas (implied by -Wall) and +-Wattributes (enabled by default), respectively; for Fortran +free-source code, there is a warning enabled by default and, for fixed-source +code, the omx sentinel is warned for with -Wsurprising +(enabled by -Wall). Unknown clauses are always rejected with an +error. OpenMP Releases and Status +November 9, 2022 +https://www.openmp.org/wp-content/uploads/openmp-TR11.pdf;>OpenMP +Technical Report 11 (first preview for the OpenMP API Version 6.0) has been +released. + November 9, 2021 https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf;>OpenMP Version 5.2 has been released.
Re: old install to a different folder
Hi Richard, On 11.11.22 11:18, Richard Bienr wrote: Note I think we can "remove" the install/ and onlinedocs/ _landing_ pages (index.html) but we should keep the actual content pages so old links keep working. We can also replace the landing pages with a pointer to the new documentation (or plain re-direct to that!). For install, I think we should consider to redirect. Before the move to Sphinx, we had only: binaries.html build.html configure.html download.html finalinstall.html gfdl.html index.html prerequisites.html specific.html test.html Re-directing them to the new pages will work. There is a one-to-one correspondence for all but build/test which are now in 7* and 5 files, respectively. Still linking to the outermost should be ok as I do not think that there will be many links using '#...'. (*The subdivision is also a bit pointless for Ada and D as it consists only of the texts "GNAT prerequisites." and "GDC prerequisites.", respectively (in the old doc). In the Sphinx docs, it is even shortened to: "GNAT." and "GDC.".) The only except where links to page anchors are likely used is for "Host/target specific installation notes for GCC". For them, some like '#avr' still works while others don't (like 'nvptx-*-none' as '#nvptx-x-none' changed to '#nvptx-none'). But the page is short enough and it is clear from the context what the user wants - there is also a table of content on the right to click on. (IMHO that's sufficient.) * * * For /onlinedocs/, I concur that we want to have the old doc there as there are many deep links. Still, we should consider adding a disclaimer box to all former mainline documentation stating that this data is no longer updated + point to the new overview page + we could redirect access which goes directly to '//' and not a (sub)html page to the new site, as you proposed. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: old install to a different folder
On 11.11.22 09:50, Martin Liška wrote: I do support the Richi's idea about using a new URL for the new Sphinx documentation while keeping the older Texinfo documentation under /onlinedocs and /install If we do so and those become then static files: Can we put some disclaimer at the top of all HTML files under /install/ and under /onlinedocs// that those are legacy files and the new documentation can be found under (not a deep link but directly to the install pages or the new overview page about the Sphinx docs). I think we really need such a hint – otherwise it is more confusing than helpful! Additionally, we should add a "news" entry to the mainpage pointing out that it changed and linking to the new Sphinx doc. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: old install to a different folder
Hi Gerald, On 10.11.22 20:24, Gerald Pfeifer wrote: On Thu, 10 Nov 2022, Martin Liška wrote: We noticed we'll need the old /install to be available for redirect. Gerald, can you please put it somewhere under /install-prev, or something similar? I'm afraid I am confused now. Based on your original request I had removed the original /install directoy. I think we just need to handle more. Namely: * Links directly to https://gcc.gnu.org/install/ this works and shows the new page. * Sublinks - those currently fail as the name has changed: https://gcc.gnu.org/install/configure.html (which is now https://gcc.gnu.org/install/configuration.html ) https://gcc.gnu.org/install/build.html (now: https://gcc.gnu.org/install/building.html ) https://gcc.gnu.org/install/specific.html#avr → https://gcc.gnu.org/install/host-target-specific-installation-notes-for-gcc.html#avr My impression is that it is sufficient to handle those renamings and we do not need the old pages. However, others might have different ideas. Note that this was discussed in the thread "Links to web pages are broken." Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [DOCS] sphinx: use new Sphinx links
Hi, On 10.11.22 11:03, Gerald Pfeifer wrote: On Thu, 10 Nov 2022, Martin Liška wrote: https://gcc.gnu.org/install/ is back with a new face. But it's not working properly due to some Content Security Policy: Hmm, it worked in my testing before and I just tried again: Firefox 106.0.1 (64-bit) Did you open the console (F12)? If I do, I see the errors: Content Security Policy: The page’s settings blocked the loading of a resource at inline (“default-src”). That's for line 18, which is
Re: [Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]
Hello, On 06.11.22 21:32, Mikael Morin wrote: Le 05/11/2022 à 23:28, Tobias Burnus a écrit : OK for mainline? The trans-array.c part looks good. A couple of nits for the trans-expr.cc part: - /* Use the rhs string length and the lhs element size. */ - size = string_length; - tmp = TREE_TYPE (gfc_typenode_for_spec (>ts)); - tmp = TYPE_SIZE_UNIT (tmp); + /* Use the rhs string length and the lhs element size. Note that 'size' is + used below for the string-length comparison, only. */ + size = string_length, s/,/;/ ? + tmp = TYPE_SIZE_UNIT (gfc_get_char_type (expr2->ts.kind)); Here you are using the rhs element size, which contradicts the comment, so there is certainly something to fix here (either the comment or the code). I did remove it in between for testing – but obviously completely messed up when re-adding it :-/ However, testing indicates that expr1 vs. expr2 does not make a difference for the kind calculation: character(len=:,kind=1), allocatable :: c1l character(len=:,kind=4), allocatable :: c4l c1l = c4l c4l = c1l as the code path is different and the result is in either case: c1l = (character(kind=1)[1:.c1l] *) __builtin_realloc ((void *) c1l, MAX_EXPR <(sizetype) .c4l, 1>); c4l = (character(kind=4)[1:.c4l] *) __builtin_realloc ((void *) c4l, MAX_EXPR <(sizetype) .c1l * 4, 1>); Still, matching the comment makes sense. As for the testcase, do you keep the code commented on purpose? I think it happened when I did 'git add' after adding the PR to the testcase, missing the commented lines I added for the explaining dumps :-/ Can some of it be removed or uncommented? It should be all uncommented, except for the 'print' line. Updated patch attached; passed quick testing + I will fully regtest it. — I will commit it, unless more comments come up. Tobias PS: Writing patches while being tired works, but writing clean patches obvious does not. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fix reallocation on assignment for kind=4 strings [PR107508] The check whether reallocation on assignment was required did not handle kind=4 characters correctly such that there was always a reallocation, implying issues with pointer addresses and lower bounds. Additionally, with all deferred strings, the old memory was not freed on reallocation. And, finally, inside the block which was only executed if string lengths or bounds or dynamic types changed, was a subcheck of the same, which was effectively a no op but still confusing and at least added with -O0 extra instructions to the binary. PR fortran/107508 gcc/fortran/ChangeLog: * trans-array.cc (gfc_alloc_allocatable_for_assignment): Fix string-length check, plug memory leak, and avoid generation of effectively no-op code. * trans-expr.cc (alloc_scalar_allocatable_for_assignment): Extend comment; minor cleanup. gcc/testsuite/ChangeLog: * gfortran.dg/widechar_11.f90: New test. gcc/fortran/trans-array.cc| 57 --- gcc/fortran/trans-expr.cc | 6 ++-- gcc/testsuite/gfortran.dg/widechar_11.f90 | 51 +++ 3 files changed, 60 insertions(+), 54 deletions(-) diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 514cb057afb..b7d4c41b5fe 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -10527,7 +10527,6 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, tree offset; tree jump_label1; tree jump_label2; - tree neq_size; tree lbd; tree class_expr2 = NULL_TREE; int n; @@ -10607,6 +10606,11 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, elemsize1 = expr1->ts.u.cl->backend_decl; else elemsize1 = lss->info->string_length; + tree unit_size = TYPE_SIZE_UNIT (gfc_get_char_type (expr1->ts.kind)); + elemsize1 = fold_build2_loc (input_location, MULT_EXPR, + TREE_TYPE (elemsize1), elemsize1, + fold_convert (TREE_TYPE (elemsize1), unit_size)); + } else if (expr1->ts.type == BT_CLASS) { @@ -10699,19 +10703,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, /* Allocate if data is NULL. */ cond_null = fold_build2_loc (input_location, EQ_EXPR, logical_type_node, array1, build_int_cst (TREE_TYPE (array1), 0)); - - if (expr1->ts.type == BT_CHARACTER && expr1->ts.deferred) -{ - tmp = fold_build2_loc (input_location, NE_EXPR, - logical_type_node, - lss->info->string_length, - rss->info->string_length); - cond_null = fold_build2_loc (input_location, TRUTH_OR_EXPR, - logical_type_node, tmp, cond_null); - cond_null= gfc_evaluate_now
[Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]
Prior to the attached patch, there is a problem with realloc on assignment with kind=4 characters as the string length was compared with the byte size, which was always true. I initially thought, looking at the code, that scalars have the same issues, but they don't; hence, I ended up with a comment and a cleanup. For arrays: The issue shows up in the testcase (→ PR) because there was unnecessary reallocation on assignment, which changed the lower bound to 1. The rest, I found looking at the dump: (a) cond_null was: D.4298 = .a4str != 7 || (character(kind=4)[0:][1:.a4str] *) a4str.data == 0B; ... if (D.4298) a4str.data = __builtin_malloc (168); else a4str.data = __builtin_realloc (a4str.data, 168); which is the wrong condition. It should be just: D.4298 = (character(kind=4)[0:][1:.a4str] *) a4str.data == 0B; to avoid a memory leak. (b) The rest was removing bogus code; I think it did not do any harm, but makes the code and the dump rather convoluted. The dump (with and without the patch) starts with: D.4295 = .a4str * 4; .a4str = 7; D.4298 = (character(kind=4)[0:][1:.a4str] *) a4str.data == 0B; if (D.4298) goto L.6; if (a4str.dim[0].lbound + 5 != a4str.dim[0].ubound) goto L.6; if (D.4295 != 28) goto L.6; goto L.7; L.6:; a4str.dim[0].lbound = 1; if (D.4298) a4str.data = __builtin_malloc (168); else a4str.data = __builtin_realloc (a4str.data, 168); L.7:; Thus, any code which reaches L.6 should be reallocated and any code which does not, shouldn't. The deleted code did add directly after L.6 the following additional code: if (D.4298) D.4282 = 0; else D.4282 = MAX_EXPR + 1; D.4283 = D.4282 != 6; and it changed the 'else' into an 'else if' in if (D.4298) a4str.data = __builtin_malloc (168); else if (D.4283) a4str.data = __builtin_realloc (a4str.data, 168); Closely looking at the added condition and at source code, it does essentially the same check as the code which guarded the L.6 to L.7 code. Thus, the condition should always evaluate as true. Codewise, the 'D.4282 != 6' is the 'size1 != size2' array size comparison. I think it was the now removed code was there before, but then someone realized the array bounds problem - and the new code was added without actually removing the old one. The handling of deferred strings both in the bogus condition for cond_null and by setting 'D.4283' to always true is not only wrong but implies some early hack. However, I have not checked the history to confirm my suspicion. OK for mainline? Tobias PS: I have the feeling that there might be an issue with finalization/derived-type handling in case of 'realloc' as I did not spot finalization code between the size check and the malloc/realloc. The malloc case should be fine – but if realloc shrinks the memory, elements beyond the new last element in storage order would access invalid memory. – However, I have not checked whether there is indeed a problem as I concentrated on fixing this issue. PPS: I lost track of pending patches. Are they any which I should review? - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fix reallocation on assignment for kind=4 strings [PR107508] The check whether reallocation on assignment was required did not handle kind=4 characters correctly such that there was always a reallocation, implying issues with pointer addresses and lower bounds. Additionally, with all deferred strings, the old memory was not freed on reallocation. And, finally, inside the block which was only executed if string lengths or bounds or dynamic types changed, was a subcheck of the same, which was effectively a no op but still confusing and at least added with -O0 extra instructions to the binary. PR fortran/107508 gcc/fortran/ChangeLog: * trans-array.cc (gfc_alloc_allocatable_for_assignment): Fix string-length check, plug memory leak, and avoid generation of effectively no-op code. * trans-expr.cc (alloc_scalar_allocatable_for_assignment): Extend comment; minor cleanup. gcc/testsuite/ChangeLog: * gfortran.dg/widechar_11.f90: New test. gcc/fortran/trans-array.cc| 57 --- gcc/fortran/trans-expr.cc | 8 ++--- gcc/testsuite/gfortran.dg/widechar_11.f90 | 52 3 files changed, 62 insertions(+), 55 deletions(-) diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 514cb057afb..b7d4c41b5fe 100644 --- a/gcc/fortran/trans-array.cc +++
[Patch] OpenMP/Fortran: 'target update' with DT components (was: [Patch] OpenMP/Fortran: 'target update' with strides + DT components)
On 03.11.22 13:44, Jakub Jelinek wrote: [...] Otherwise LGTM, assuming it actually works correctly. I don't remember support for non-contiguous copying to/from devices being actually added, [...] And I think it is not ok to copy bytes that aren't requested to be copied. I have now removed that stride support and only kept the bug fix and the DT component parts of the patch. The only code change is to remove the stride check disabling in openmp.cc and in one testcase, to remove the stride part. I will commit it as attached, unless there are further comments (or the just started reg testing shows that something does not work). Tobias PS: For strides, I now filed: PR middle-end/107517 "[OpenMP][5.0] 'target update' with strides — for C/C++ and Fortran" https://gcc.gnu.org/PR107517 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: 'target update' with DT components OpenMP 5.0 permits to use arrays with derived type components for the list items to the 'from'/'to' clauses of the 'target update' directive. gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_clauses): Permit derived types for the 'to' and 'from' clauses of 'target update'. * trans-openmp.cc (gfc_trans_omp_clauses): Fixes for derived-type changes; fix size for scalars. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-11.f90: New test. * testsuite/libgomp.fortran/target-13.f90: New test. gcc/fortran/openmp.cc | 10 +- gcc/fortran/trans-openmp.cc | 9 +- libgomp/testsuite/libgomp.fortran/target-11.f90 | 75 +++ libgomp/testsuite/libgomp.fortran/target-13.f90 | 159 4 files changed, 246 insertions(+), 7 deletions(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 653c43f79ff..e0e3b52ad57 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -2499,9 +2499,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, true) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_FROM) - && gfc_match_omp_variable_list ("from (", + && (gfc_match_omp_variable_list ("from (", >lists[OMP_LIST_FROM], false, - NULL, , true) == MATCH_YES) + NULL, , true, true) + == MATCH_YES)) continue; break; case 'g': @@ -3436,9 +3437,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, continue; } else if ((mask & OMP_CLAUSE_TO) - && gfc_match_omp_variable_list ("to (", + && (gfc_match_omp_variable_list ("to (", >lists[OMP_LIST_TO], false, - NULL, , true) == MATCH_YES) + NULL, , true, true) + == MATCH_YES)) continue; break; case 'u': diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 9bd4e6c7e1b..4bfdf85cd9b 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -3626,7 +3626,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, gcc_unreachable (); } tree node = build_omp_clause (input_location, clause_code); - if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL) + if (n->expr == NULL + || (n->expr->ref->type == REF_ARRAY + && n->expr->ref->u.ar.type == AR_FULL + && n->expr->ref->next == NULL)) { tree decl = gfc_trans_omp_variable (n->sym, false); if (gfc_omp_privatize_by_reference (decl)) @@ -3666,13 +3669,13 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, { tree ptr; gfc_init_se (, NULL); - if (n->expr->ref->u.ar.type == AR_ELEMENT) + if (n->expr->rank == 0) { gfc_conv_expr_reference (, n->expr); ptr = se.expr; gfc_add_block_to_block (block, ); OMP_CLAUSE_SIZE (node) - = TYPE_SIZE_UNIT (TREE_TYPE (ptr)); + = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ptr))); } else { diff --git a/libgomp/testsuite/libgomp.fortran/target-11.f90 b/libgomp/testsuite/libgomp.fortran/target-11.f90 new file mode 100644 index 000..b0faa2e620d --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-11.f90 @@ -0,0 +1,75 @@ +! Based on libgomp.c/target-23.c + +! { dg-additional-options "-fdump-tree-original" } +! { dg-final { scan-tree-dump "omp target update to\\(xxs\\\[3\\\] \\\[len: 2\\\]\\)" "original" } } +! { dg-final { scan-tree-dump "omp target update to\\(s\\.s \\\[len: 4\\\]\\)" "original" } } +! { dg-final { scan-tree-dump "omp target update from\\(s\\.s \\\[len: 4\\\]\\)" "original" } } + +module m + implicit none + type S_type +integer s +integer, pointer :: u(:) => null() +integer :: v(0:4) + end type S_type + integer, volatile :: z +end module m + +program main + use m + implicit none +
[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr
This fixes some an issue with 'alloc:' found when working on the patch '[Patch] OpenMP/Fortran: 'target update' with strides + DT components' https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604687.html (BTW: This one is still pending review.) OK for mainline? * * * I think the patch is a great improvement. However, again, by writing a testcase, more issues have been found: * one generic Fortran one, worked around by adding '(:)', Cf. https://gcc.gnu.org/PR107508 "Invalid bounds due to bogus reallocation on assignment with KIND=4 characters". * Some other string issues, some might be generic Fortran issues * Some issue with pointers - where exit data give an error as 0x00 and 0x01 kinds are not known by target exit data Those also showed up with the 'target update' patch mentioned above. For the last two, I used '#if 0' followed by a comment with the current error message. I do intent to look into those - or at least file a PR. Likewise for the remaining issues mentioned in the 'tagret update' patch. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr When using 'map(alloc: var, dt%comp)' needs to have a 'to' mapping of the array descriptor as otherwise the bounds are not available in the target region. - Likewise for character strings. This patch implements this; however, some additional issues are exposed by the testcase; those are '#if 0'ed and will be handled later. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Ensure DT struct-comp with array descriptor and 'alloc:' have the descriptor mapped with 'to:'. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-enter-data-3.f90: New test. gcc/fortran/trans-openmp.cc |3 libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 | 567 ++ 2 files changed, 569 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 4bfdf85cd9b..4eb9d4c9edc 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -3507,7 +3507,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, = gfc_full_array_size (block, inner, rank); tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type)); - if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node))) + if (GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (node)) + || OMP_CLAUSE_MAP_KIND (node) == GOMP_MAP_ALLOC) map_kind = GOMP_MAP_TO; else if (n->u.map_op == OMP_MAP_RELEASE || n->u.map_op == OMP_MAP_DELETE) diff --git a/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 new file mode 100644 index 000..1fe3f03c7b8 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/target-enter-data-3.f90 @@ -0,0 +1,567 @@ +! { dg-additional-options "-cpp" } + +! FIXME: Some tests do not work yet. Those are for now in '#if 0' + +! Check that 'map(alloc:' properly works with +! - deferred-length character strings +! - arrays with array descriptors +! For those, the array descriptor / string length must be mapped with 'to:' + +program main +implicit none + +type t + integer :: ic(2:5), ic2 + character(len=11) :: ccstr(3:4), ccstr2 + character(len=11,kind=4) :: cc4str(3:7), cc4str2 + integer, pointer :: pc(:), pc2 + character(len=:), pointer :: pcstr(:), pcstr2 + character(len=:,kind=4), pointer :: pc4str(:), pc4str2 +end type t + +type(t) :: dt + +integer :: ii(5), ii2 +character(len=11) :: clstr(-1:1), clstr2 +character(len=11,kind=4) :: cl4str(0:3), cl4str2 +integer, pointer :: ip(:), ip2 +integer, allocatable :: ia(:), ia2 +character(len=:), pointer :: pstr(:), pstr2 +character(len=:), allocatable :: astr(:), astr2 +character(len=:,kind=4), pointer :: p4str(:), p4str2 +character(len=:,kind=4), allocatable :: a4str(:), a4str2 + + +allocate(dt%pc(5), dt%pc2) +allocate(character(len=2) :: dt%pcstr(2)) +allocate(character(len=4) :: dt%pcstr2) + +allocate(character(len=3,kind=4) :: dt%pc4str(2:3)) +allocate(character(len=5,kind=4) :: dt%pc4str2) + +allocate(ip(5), ip2, ia(8), ia2) +allocate(character(len=2) :: pstr(-2:0)) +allocate(character(len=4) :: pstr2) +allocate(character(len=6) :: astr(3:5)) +allocate(character(len=8) :: astr2) + +allocate(character(len=3,kind=4) :: p4str(2:4)) +allocate(character(len=5,kind=4) :: p4str2) +allocate(character(len=7,kind=4) :: a4str(-2:3)) +allocate(character(len=9,kind=4) :: a4str2) + + +! integer :: ic(2:5), ic2 + +!$omp target enter data map(alloc: dt%ic) +!$omp target map(alloc: dt%ic) + if (size(dt%ic) /= 4) error stop + if (lbound(dt%ic, 1) /= 2) error stop + if (ubound(dt%ic, 1) /= 5) error stop + dt%ic =
[Patch] OpenMP/Fortran: 'target update' with strides + DT components
I recently saw that gfortran does not support derived type components with 'target update', an OpenMP 5.0 feature. When adding it, I also found out that strides where not handled. There is probably some room of improvement about what to copy and what not, but copying too much should be fine. Build + (reg)tested on x86_64-gnu-linux without offloading configured + libgomp tested on x86_64-gnu-linux with nvptx offloading. OK for mainline? * * * PS: Follow-up work items: * Strides: OpenMP seemingly permits also 'a%b([1,6,19,12])' as long as the first index has the lowest address. – And also 'a%b(:)%c' is permitted – both not handled in this patch (and rejected with a compile-time error) * There seems to be some problems with 'alloc' with pointers and allocatables in components – but I have not rechecked. * For allocatables, 'target update' needs to do a deep mapping; I need to check whether that's the case. Note for the last two: allocatable components only works OG11/OG12 and I urgently need to cleanup + (re)submit that patch to mainline. (It came too late for GCC 12.) * There might be also some issue mapping/refcounting, which I have not investigated - affecting the 'target exit data' of target-11.f90. PPS: I intent to file at least one/some PRs about those issues, unless I can fix them quickly. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP/Fortran: 'target update' with strides + DT components OpenMP 5.0 permits to use arrays with strides and derived type components for the list items to the 'from'/'to' clauses of the 'target update' directive. gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_clauses): Permit derived types. (resolve_omp_clauses):Accept noncontiguous arrays. * trans-openmp.cc (gfc_trans_omp_clauses): Fixes for derived-type changes; fix size for scalars. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-11.f90: New test. * testsuite/libgomp.fortran/target-13.f90: New test. gcc/fortran/openmp.cc | 19 ++- gcc/fortran/trans-openmp.cc | 9 +- libgomp/testsuite/libgomp.fortran/target-11.f90 | 75 +++ libgomp/testsuite/libgomp.fortran/target-13.f90 | 162 4 files changed, 256 insertions(+), 9 deletions(-) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 653c43f79ff..2daed74be72 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -2499,9 +2499,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, true) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_FROM) - && gfc_match_omp_variable_list ("from (", + && (gfc_match_omp_variable_list ("from (", >lists[OMP_LIST_FROM], false, - NULL, , true) == MATCH_YES) + NULL, , true, true) + == MATCH_YES)) continue; break; case 'g': @@ -3436,9 +3437,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, continue; } else if ((mask & OMP_CLAUSE_TO) - && gfc_match_omp_variable_list ("to (", + && (gfc_match_omp_variable_list ("to (", >lists[OMP_LIST_TO], false, - NULL, , true) == MATCH_YES) + NULL, , true, true) + == MATCH_YES)) continue; break; case 'u': @@ -7585,8 +7587,11 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, Only raise an error here if we're really sure the array isn't contiguous. An expression such as arr(-n:n,-n:n) could be contiguous even if it looks - like it may not be. */ + like it may not be. + And OpenMP's 'target update' permits strides for + the to/from clause. */ if (code->op != EXEC_OACC_UPDATE + && code->op != EXEC_OMP_TARGET_UPDATE && list != OMP_LIST_CACHE && list != OMP_LIST_DEPEND && !gfc_is_simply_contiguous (n->expr, false, true) @@ -7630,7 +7635,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, int i; gfc_array_ref *ar = >u.ar; for (i = 0; i < ar->dimen; i++) - if (ar->stride[i] && code->op != EXEC_OACC_UPDATE) + if (ar->stride[i] + && code->op != EXEC_OACC_UPDATE + && code->op != EXEC_OMP_TARGET_UPDATE) { gfc_error ("Stride should not be specified for " "array section in %s clause at %L", diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 9bd4e6c7e1b..4bfdf85cd9b 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -3626,7 +3626,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, gcc_unreachable (); } tree node = build_omp_clause (input_location, clause_code); - if (n->expr == NULL || n->expr->ref->u.ar.type
Re: [PATCH] OpenMP: Duplicate checking for map clauses in Fortran (PR107214)
Hi Julian, I had a first quick lock at this patch, I should have a closer look later. However, I stumbled over the following: On 20.10.22 18:14, Julian Brown wrote: typedef struct gfc_symbol { ... struct gfc_symbol *old_symbol; unsigned mark:1, comp_mark:1, data_mark:1, dev_mark:1, gen_mark:1; unsigned reduc_mark:1, gfc_new:1; struct gfc_symbol *tlink; unsigned equiv_built:1; ... I know that this was the case before, but can you move the mark:1 etc. after 'tlink'? In that case all bitfields are grouped together. If I have not miscounted, we have currently 7 bits before and 9 bits after 'tlink' and grouping them together reduced pointless padding. * * * + else if (n->sym->mark) + gfc_error ("Symbol %qs present on both data and map clauses " +"at %L", n->sym->name, >where); I wonder whether that also rejects the following – which seems to be valid. The 'map' goes to 'target' and the 'firstprivate' to 'parallel', cf. OpenMP 5.2, "17.2 Clauses on Combined and Composite Constructs", [340:3-4 & 12-14]. (BTW: While some fixes went into 5.1 regarding this section, a likewise wording is already in 5.0.) (Testing showed: it give an ICE without the patch and an error with.) module m integer :: a = 1 end module m module m2 contains subroutine bar() use m !$omp declare target a = a + 5 end subroutine bar end program p use m !$omp target parallel do map(a) firstprivate(a) do i = 1, 1 a = 7 call bar() if (a /= 7) error stop 1 a = a + 8 end do if (a /= 6) error stop end * * * The ICE seems to be because gcc/fortran/trans-openmp.cc's gfc_split_omp_clauses mishandles this as the dump shows the following: #pragma omp target firstprivate(a) map(tofrom:a) #pragma omp parallel firstprivate(a) * * * In contrast, for the C testcase: void foo(int x) { #pragma omp target parallel for simd map(x) firstprivate(x) for (int k = 0; k < 1; ++k) x = 1; } the dump is as follows, which seems to be sensible: #pragma omp target map(tofrom:x) #pragma omp parallel firstprivate(x) #pragma omp for nowait #pragma omp simd Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling
Hi Tobias! On 24.10.22 21:11, Thomas Schwinge wrote: On 2022-10-24T21:05:46+0200, I wrote: On 2022-10-24T16:07:25+0200, Jakub Jelinek via Gcc-patches wrote: On Wed, Oct 12, 2022 at 10:55:26AM +0200, Tobias Burnus wrote: libgomp/nvptx: Prepare for reverse-offload callback handling Well. + struct rev_offload *rev_data; ... but as far as I can tell, this is never initialized in 'nvptx_open_device', which does 'ptx_dev = GOMP_PLUGIN_malloc ([...]);'. Would the following be the correct fix (currently testing)? --- libgomp/plugin/plugin-nvptx.c +++ libgomp/plugin/plugin-nvptx.c @@ -546,6 +546,8 @@ nvptx_open_device (int n) ptx_dev->omp_stacks.size = 0; pthread_mutex_init (_dev->omp_stacks.lock, NULL); + ptx_dev->rev_data = NULL; + return ptx_dev; } LGTM and I think it is obvious – albeit I am not sure why it did not fail when testing it here. Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
*ping* Re: [Patch] OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]
Ping this patch – and also "Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling". For the latter cf. Alexander's code approval https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603908.html – and his concerns regarding the generic feature in https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601959.html (I think 'target nowait' permits what he thinks is the better way for GPUs.) Tobias On 18.10.22 21:27, Tobias Burnus wrote: Found when playing around with reverse offload once I used 'omp target parallel'. The other issue showed up when running the testsuite (which is done with -O2). In all cases, the ICE is in expand_GOMP_TARGET_REV of this IFN, which should be unreachable Note: ENABLE_OFFLOADING inside the compiler must evaluate to true to show up as ICE - otherwise, the IFN is not even generated. I did not see a good reason for DECL_CONTEXT = NULL, thus, I now set it to the same as was set for child_fn - for no good reason. Tested on x86-64 with ENABLE_OFFLOADING albeit without true offloading. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG12] omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG
Given that omp-oacc-kernels-decompose.cc only exists on OG12, the fix only applies to OG12. The fail show up since "Kernels loops annotation: C and C++." as that adds GIMPLE_DEBUG which is not handled in omp-oacc-kernels-decompose.cc at all. (Actually, it even fails with a sorry when compiling with -g2; however, -fcompare-debug is supported and was failing.) – For details see patch. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 807b755357c4eb03260d229f4a851009fe058e51 Author: Tobias Burnus Date: Thu Oct 20 19:20:36 2022 +0200 omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG GIMPLE_DEBUG were put in a parallel region of its own, which is not only pointless but also breaks -fcompare-debug. With this commit, they are handled like simple assignments: those placed are places into the same body as the loop such that only one parallel region remains as without debugging. This fixes the existing testcase libgomp.oacc-c-c++-common/kernels-loop-g.c. Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they appear otherwise, decompose_kernels_region_body rejects them with a sorry (unchanged). gcc/ * omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt, decompose_kernels_region_body): Handle GIMPLE_DEBUG like simple assignment. --- gcc/omp-oacc-kernels-decompose.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc index 4e940c1ee0f..a7e3d764d52 100644 --- a/gcc/omp-oacc-kernels-decompose.cc +++ b/gcc/omp-oacc-kernels-decompose.cc @@ -120,7 +120,8 @@ top_level_omp_for_in_stmt (gimple *stmt) for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next ()) { gimple *body_stmt = gsi_stmt (gsi); - if (gimple_code (body_stmt) == GIMPLE_ASSIGN) + if (gimple_code (body_stmt) == GIMPLE_ASSIGN + || gimple_code (body_stmt) == GIMPLE_DEBUG) continue; else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR && gsi_one_before_end_p (gsi)) @@ -1398,7 +1399,7 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) = (gimple_code (stmt) == GIMPLE_ASSIGN && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL && DECL_ARTIFICIAL (gimple_assign_lhs (stmt))); - if (!is_simple_assignment) + if (!is_simple_assignment && gimple_code (stmt) != GIMPLE_DEBUG) only_simple_assignments = false; } }
[OG12] libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=
Follow up to the mainline commit (https://gcc.gnu.org/r13-3407 + backported to OG12): "libgomp: Add offload_device_gcn check, add requires-4a.c test" This xfails requires-4.c on pseudo-USM systems. As mentioned in the email for that patch OG12's unified-share memory implemention is for pseudo-USM systems where only specially allocated memory (managed, pinned) is device accessible. - Thus, requires4.c failed as it used static memory. (requires4a.c works as it uses heap-allocated memory.) Tobias PS: For USM in mainline, see patch submission at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 0c47ae1c9283a812f832e80e451bfa82519c21e8 Author: Tobias Burnus Date: Thu Oct 20 13:25:25 2022 +0200 libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory= The USM implementation uses -foffload-memory=... which allocates variables in a special memory. This does not support static variables. Hence, XFAIL this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses hash memory instead. libgomp/ * testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on nvptx and gcn. --- libgomp/testsuite/libgomp.c-c++-common/requires-4.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c index 5883eff0d93..c6b28d5442f 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c @@ -2,6 +2,8 @@ /* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */ /* { dg-additional-sources requires-4-aux.c } */ +/* { dg-xfail-run-if "USM via -foffload-memory=... does not support static variables" { offload_device_nvptx || offload_device_gcn } } */ + /* Check no diagnostic by device-compiler's or host compiler's lto1. Other file uses: 'requires reverse_offload', but that's inactive as there are no declare target directives, device constructs nor device routines */
[OG12][committed] Fix omp-expand.cc's expand_omp_target for OpenACC
Fallout of my Fortran deep-mapping patch, which I somehow missed – probably because being inundated by the OG11 OpenACC fails back then. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 0d6fc5032c7ba8a95301d0ccbc418875e73955ac Author: Tobias Burnus Date: Wed Oct 19 17:31:14 2022 +0200 Fix omp-expand.cc's expand_omp_target for OpenACC In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c, "Fortran/OpenMP: Support mapping of DT with allocatable components" the size of the addr/sizes/kind arrays was passed as 4th argument. However, OpenACC uses >3 arguments for its own purpose, e.g. to handle noncontiguous arrays by passing an array descriptor there. This patch restores the previous behaviour for OpenACC, fixing testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c. gcc/ * omp-expand.cc (expand_omp_target): Fix OpenACC in case there are more than 3 arguments to the builtin function. --- gcc/ChangeLog.omp | 5 + gcc/omp-expand.cc | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index 527a9850dba..32a8c7b485f 100644 --- a/gcc/ChangeLog.omp +++ b/gcc/ChangeLog.omp @@ -1,3 +1,8 @@ +2022-10-19 Tobias Burnus + + * omp-expand.cc (expand_omp_target): Fix OpenACC in case there + are more than 3 arguments to the builtin function. + 2022-10-17 Thomas Schwinge Backported from master: diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index 92996685d41..6529f63362b 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -10456,7 +10456,7 @@ expand_omp_target (struct omp_region *region) t3 = t2; t4 = t2; } - else if (TREE_VEC_LENGTH (t) == 3) + else if (TREE_VEC_LENGTH (t) == 3 || is_gimple_omp_oacc (entry_stmt)) { t1 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (TREE_VEC_ELT (t, 1; t1 = size_binop (PLUS_EXPR, t1, size_int (1)); commit 92b14810a2743594df945dc6479413a3d9d943aa Author: Tobias Burnus Date: Wed Oct 19 17:26:34 2022 +0200 ChangeLog for "Fortran: Fix delinearization regression" Missed to update gcc/fortran/ChangeLog.omp and to include the following in previous commit, i.e. commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be. gcc/fortran/ChangeLog: * trans-array.cc (non_negative_strides_array_p): Fix handling of GFC_DECL_SAVED_DESCRIPTOR. (gfc_conv_array_ref): Use ARRAY_REF again when possible. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version, update one scan-tree item. * gfortran.dg/gomp/depend-4.f90: Revert to upstream version. * gfortran.dg/gomp/depend-5.f90: Likewise. * gfortran.dg/gomp/depend-6.f90: Likewise. --- gcc/fortran/ChangeLog.omp | 6 ++ gcc/testsuite/ChangeLog.omp | 8 2 files changed, 14 insertions(+) diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp index 685fe68667a..189431df4eb 100644 --- a/gcc/fortran/ChangeLog.omp +++ b/gcc/fortran/ChangeLog.omp @@ -1,3 +1,9 @@ +2022-10-19 Tobias Burnus + + * trans-array.cc (non_negative_strides_array_p): Fix handling + of GFC_DECL_SAVED_DESCRIPTOR. + (gfc_conv_array_ref): Use ARRAY_REF again when possible. + 2022-10-17 Tobias Burnus Backport from mainline: diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp index b2b4381e3ce..6928d520c0f 100644 --- a/gcc/testsuite/ChangeLog.omp +++ b/gcc/testsuite/ChangeLog.omp @@ -1,3 +1,11 @@ +2022-10-19 Tobias Burnus + + * gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version, + update one scan-tree item. + * gfortran.dg/gomp/depend-4.f90: Revert to upstream version. + * gfortran.dg/gomp/depend-5.f90: Likewise. + * gfortran.dg/gomp/depend-6.f90: Likewise. + 2022-10-17 Tobias Burnus Backport from mainline:
[OG12][committed] Fortran: Fix delinearization regression
As mentioned in the patch submission for "Fortran: Fix non_negative_strides_array_p", there were some issues on OG12 which uses Sandra's delinearization patch (and was forward ported from OG11) This patch fixes one issue, caused by a GCC 12 change. At some point, we could think of using the delinearization patch in mainline; on OG12 it is used together with some Graphite work to parallelize loops in OpenACC's kernels construct. But at least in principle, it could also offer better optimization options in general. Tobias PS: OG12 alias devel/omp/gcc-12 is a branch based on GCC 12 that contains OpenMP, OpenACC and offloading commits not yet on GCC 12. Several of those commits went first to mainline/GCC 13, but some are so far only on OG12 – like this delinearization patch. The goal is to eventually have all features in mainline. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be Author: Tobias Burnus Date: Wed Oct 19 15:53:25 2022 +0200 Fortran: Fix delinearization regression The delinearization patch "Fortran: delinearize multi-dimensional array accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses gfc_build_array_ref for the non-delinearization path. The generated code depends on whether there can be negative strides or not, an addition to that function in r12-8230-g7964ab6c364 - adding a Boolean argument. The follow-up OG12 commit "Fix Fortran array-access regressions", 9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument to the call in gfc_conv_array_ref, but always evaluating as false. This commit changes it to a call to non_negative_strides_array_p (Note: for 'se->expr' not 'base'; the former could be 'arraydesc' while the later is then 'arraydesc.data' whose TREE_TYPE does not contain information about the array type.) However, doing so revealed a bug in non_negative_strides_array_p, fixed in this commit but also submitted as "Fortran: Fix non_negative_strides_array_p" to mainline, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html As a side effect of this commit, several testcases now pass and the OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90 could be undone, except that the latter now uses the delinearized array syntax in one case, which is an improvement (as honored in the scan-dump-tree). Hence, this commit (partially) reverts the commits: 21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump 014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90} 2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update The main testcase for non_negative_strides_array_p is gfortran.dg/array_reference_3.f90, which now also passes as well. Additionally, this changes prevents some unintended implicit mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed before - and now passes again. --- gcc/fortran/trans-array.cc | 18 -- .../gfortran.dg/gomp/affinity-clause-1.f90 | 6 +- gcc/testsuite/gfortran.dg/gomp/depend-4.f90| 74 +++--- gcc/testsuite/gfortran.dg/gomp/depend-5.f90| 13 ++-- gcc/testsuite/gfortran.dg/gomp/depend-6.f90| 72 ++--- 5 files changed, 91 insertions(+), 92 deletions(-) diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index bc2477e4aea..13d92c9fb1f 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -3703,11 +3703,16 @@ non_negative_strides_array_p (tree expr) /* If the array was originally a dummy with a descriptor, strides can be negative. */ - if (DECL_P (expr) - && DECL_LANG_SPECIFIC (expr) - && GFC_DECL_SAVED_DESCRIPTOR (expr) - && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr) -return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr)); + tree decl = expr; + STRIP_NOPS (decl); + if (TREE_CODE (decl) == INDIRECT_REF) +decl = TREE_OPERAND (decl, 0); + + if (DECL_P (decl) + && DECL_LANG_SPECIFIC (decl) + && GFC_DECL_SAVED_DESCRIPTOR (decl) + && GFC_DECL_SAVED_DESCRIPTOR (decl) != expr) +return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (decl)); return true; } @@ -4200,12 +4205,13 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, { /* Build a linearized array reference using the offset from all dimensions. */ + bo
[Patch] Fortran: Fix non_negative_strides_array_p
First, I am woefully aware that there several patches pending. I hope to do a couple of reviews later today or in the next days. Otherwise, I did run into another issue in existing code which was exposed by the delinearization patch on the OG12 branch, but could potentially lead to wrong code on mainline as well, depending on how the return value is used. Albeit I did fail to create a testcase for it. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fix non_negative_strides_array_p The non_negative_strides_array_p function might return wrongly 'true', e.g. for assumed-shape arrays, if the argument is '*a.0 ...' instead of 'a.0 ...' as then the saved array descriptor for the PARAM_DECL 'a' is not found. This potentially leads to wrong code - but I could not find a testcase leading to wrong code on mainline. Asserts show that this happens with CLASS; however, for those no ARRAY_REF seems to get used. The issue show up when applying the delinearization patch as posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562230.html that has been applied to the OG12 alias devel/omp/gcc-12 vendor branch, as commit 39a8c371fda6136cf77c74895a00b136409e0ba3. This patch calls gfc_build_array_ref inside gfc_conv_array_ref. The issue mentioned above show up with this patch in gfortran.dg/array_reference_3.f90, a testcase added together with non_negative_strides_array_p in commit r12-8230-g7964ab6c364 for PR 102043. Here, non_negative_strides_array_p returns true for assumed_shape_x but assumed shape arrays may have negative strides. gcc/fortran/ChangeLog: * trans-array.cc (non_negative_strides_array_p): Fix handling of GFC_DECL_SAVED_DESCRIPTOR. gcc/fortran/trans-array.cc | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 795ce14af08..ca3503b7cae 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -3695,11 +3695,16 @@ non_negative_strides_array_p (tree expr) /* If the array was originally a dummy with a descriptor, strides can be negative. */ - if (DECL_P (expr) - && DECL_LANG_SPECIFIC (expr) - && GFC_DECL_SAVED_DESCRIPTOR (expr) - && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr) -return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr)); + tree decl = expr; + STRIP_NOPS (decl); + if (TREE_CODE (decl) == INDIRECT_REF) +decl = TREE_OPERAND (decl, 0); + + if (DECL_P (decl) + && DECL_LANG_SPECIFIC (decl) + && GFC_DECL_SAVED_DESCRIPTOR (decl) + && GFC_DECL_SAVED_DESCRIPTOR (decl) != expr) +return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (decl)); return true; }
[Patch] OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]
Found when playing around with reverse offload once I used 'omp target parallel'. The other issue showed up when running the testsuite (which is done with -O2). In all cases, the ICE is in expand_GOMP_TARGET_REV of this IFN, which should be unreachable Note: ENABLE_OFFLOADING inside the compiler must evaluate to true to show up as ICE - otherwise, the IFN is not even generated. I did not see a good reason for DECL_CONTEXT = NULL, thus, I now set it to the same as was set for child_fn - for no good reason. Tested on x86-64 with ENABLE_OFFLOADING albeit without true offloading. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236] For 'target parallel' and similarly nested directives, cgraph_node's calls_declare_variant_alt was not set in the parent region node but in cfun->decl. Hence, pass_omp_device_lower did not process handle the internal function GOMP_TARGET_REV. - Solution is to set it to the DECL_CONTEXT, which is set in adjust_context_and_scope. The cgraph_node::create_clone issue is exposed with -O2 for the existing libgomp.fortran/reverse-offload-1.f90. omp-offload.cc PR middle-end/107236 gcc/ChangeLog: * omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt in DECL_CONTEXT and not to cfun->decl. * cgraphclones.cc (cgraph_node::create_clone): Copy also the node's calls_declare_variant_alt value. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/target-device-ancestor-6.f90: New test. gcc/cgraphclones.cc | 1 + gcc/omp-expand.cc | 13 ++--- .../gfortran.dg/gomp/target-device-ancestor-6.f90 | 17 + 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc index eb0fa87b554..bb4b3c5407d 100644 --- a/gcc/cgraphclones.cc +++ b/gcc/cgraphclones.cc @@ -375,6 +375,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, if (!new_inlined_to) prof_count = count.combine_with_ipa_count (prof_count); new_node->count = prof_count; + new_node->calls_declare_variant_alt = this->calls_declare_variant_alt; /* Update IPA profile. Local profiles need no updating in original. */ if (update_original) diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index 5dc0bf16e17..c636a174e36 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -10054,13 +10054,8 @@ expand_omp_target (struct omp_region *region) /* Handle the case that an inner ancestor:1 target is called by an outer target region. */ - if (!is_ancestor) - cgraph_node::get (child_fn)->calls_declare_variant_alt - |= cgraph_node::get (cfun->decl)->calls_declare_variant_alt; - else /* Duplicate function to create empty nonhost variant. */ + if (is_ancestor) { - /* Enable pass_omp_device_lower pass. */ - cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1; cgraph_node *fn2_node; child_fn2 = build_decl (DECL_SOURCE_LOCATION (child_fn), FUNCTION_DECL, @@ -10074,7 +10069,7 @@ expand_omp_target (struct omp_region *region) TREE_PUBLIC (child_fn2) = 0; DECL_UNINLINABLE (child_fn2) = 1; DECL_EXTERNAL (child_fn2) = 0; - DECL_CONTEXT (child_fn2) = NULL_TREE; + DECL_CONTEXT (child_fn2) = DECL_CONTEXT (child_fn); DECL_INITIAL (child_fn2) = make_node (BLOCK); BLOCK_SUPERCONTEXT (DECL_INITIAL (child_fn2)) = child_fn2; DECL_ATTRIBUTES (child_fn) @@ -10098,6 +10093,10 @@ expand_omp_target (struct omp_region *region) fn2_node->force_output = 1; node->offloadable = 0; + /* Enable pass_omp_device_lower pass. */ + fn2_node = cgraph_node::get (DECL_CONTEXT (child_fn)); + fn2_node->calls_declare_variant_alt = 1; + t = build_decl (DECL_SOURCE_LOCATION (child_fn), RESULT_DECL, NULL_TREE, void_type_node); DECL_ARTIFICIAL (t) = 1; diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90 new file mode 100644 index 000..821e7852e85 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-6.f90 @@ -0,0 +1,17 @@ +! PR middle-end/107236 + +! Did ICE before because IFN .GOMP_TARGET_REV was not +! processed in omp-offload.cc. +! Note: Test required ENABLE_OFFLOADING being true inside GCC. + +implicit none +!$omp requires reverse_offload +!$omp target parallel num_threads(4) + !$omp target device(ancestor:1) +call foo() + !$omp end target +!$omp end target parallel +contains + subroutine foo + end +end
*ping* / Re: [Patch] libgomp: Add offload_device_gcn check, add requires-4a.c test
On 12.10.22 16:05, Tobias Burnus wrote: This came up because the USM implementation with -foffload-memory={unified,pinned} as posted at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html does not handle USM with static variables. This shows up for the OG12 alias devel/omp/gcc-12 branch as FAIL for requires-4.c. The attached patch prepares for skipping requires-4.c for the gcn/nvptx device and adds an adjacent requires-4a.c testcase, using heap memory, that can still run on gcn/nvptx. Additionally, I commented on no longer used #defined, following the precedence GOMP_DEVICE_HOST_NONSHM. Thus, this tests adds another testcase and one effective-target check, out-comments a unused #define - and that's it. (Otherwise, it is just a prep patch.) OK for mainline? Tobias PS: Currently, neither the preexisting offload_device_nvptx nor the new offload_device_gcn target selector is used, neither in old code nor by this patch. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
*ping* / Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling
On 12.10.22 10:55, Tobias Burnus wrote: On 11.10.22 13:12, Alexander Monakov wrote: My understanding is such trickery should not be necessary with the barrier-based approach, i.e. the sequence of PTX instructions st % plain store membar.sys st.volatile should be enough to guarantee that the former store is visible on the host before the latter, and work all the way back to sm_20. If I understand it correctly, you mean: GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num; __sync_synchronize (); /* membar.sys */ asm volatile ("st.volatile.global.u64 [%0], %1;" : : "r"(addr_struct_fn), "r" (fn) : "memory"); And then directly followed by the busy wait: while (__atomic_load_n (_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIRE) != 0) ; /* spin */ which GCC expands to: /* ld.global.u64 %r64,[__gomp_rev_offload_var]; ld.u64 %r36,[%r64]; membar.sys; */ The such updated patch is attached. (This is the only change + removing the mkoffload.cc part is the only larger change. Otherwise, it only handles the minor comments by Jakub. The now removed CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT was used until commit r10-304-g1f4c5b9bb2eb81880e2bc725435d596fcd2bdfef i.e. it is a really old left over!) Otherwise, tested* to work with sm_30 (error by mkoffload, unchanged), sm_35 and sm_70. Tobias *With some added code; until GOMP_OFFLOAD_get_num_devices accepts GOMP_REQUIRES_UNIFIED_SHARED_MEMORY and GOMP_OFFLOAD_load_image gets passed a non-NULL for rev_fn_table, the current patch is a no op. Planned next is the related GCN patch – and the actual change in libgomp/target.c (+ accepting USM in GOMP_OFFLOAD_get_num_devices) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] Fortran: Fixes for kind=4 characters strings [PR107266]
Long introduction - but the patch is rather simple: Don't use kind=1 as type where kind=4 should be used. Long introduction + background, feel free to skip. This popped up for libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90 which uses kind=4 characters – if Sandra's "Fortran: delinearize multi-dimensional array accesses" patch is applied. Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562230.html Used for OG11: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584716.html On the OG12 alias devel/omp/gcc-12 vendor branch, it is used: https://gcc.gnu.org/g:39a8c371fda6136cf77c74895a00b136409e0ba3 * * * For mainline, I did not observe a wrong-code issue at runtime, still: void frobc (character(kind=4)[1:*_a] * & restrict a, ... ... static void frobc (character(kind=1) * & restrict, ... feels odd, i.e. having the definition as kind=4 and the declaration as kind=1. With the patch, it becomes: static void frobc (character(kind=4) * & restrict, character(kind=4) * &, ... * * * For the following, questionable code (→ PR107266), it is even worse: character(kind=4) function f(x) bind(C) character(kind=4), value :: x end this gives the following, which has the wrong ABI: character(kind=1) f (character(kind=1) x) { (void) 0; } With the patch, it becomes: character(kind=4) f (character(kind=4) x) * * * I think that all only exercises the trans-type.cc patch; the trans-expr.cc code gets called – as an assert shows, but I fail to get a dump where this goes wrong. However, for struct-elem-map-1.f90 with mainline or with OG12 and the patch: #pragma omp target map(tofrom:var.uni2[40 / 20] [len: 20]) while on OG12 without the attached patch: #pragma omp target map(tofrom:var.uni2[40 / 5] [len: 5]) where the problem is that TYPE_SIZE_UNIT is wrong. Whether this only affects OG12 due to the delinearizer patch or some code on mainline as well, I don't know. Still, I think it should be fixed ... OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran: Fixes for kind=4 characters strings [PR107266] PR fortran/107266 gcc/fortran/ * trans-expr.cc (gfc_conv_string_parameter): Use passed type to honor character kind. * trans-types.cc (gfc_sym_type): Honor character kind. * trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4 character strings. gcc/testsuite/ * gfortran.dg/char4_decl.f90: New test. * gfortran.dg/char4_decl-2.f90: New test. gcc/fortran/trans-decl.cc | 10 ++--- gcc/fortran/trans-expr.cc | 12 +++--- gcc/fortran/trans-types.cc | 2 +- gcc/testsuite/gfortran.dg/char4_decl-2.f90 | 59 ++ gcc/testsuite/gfortran.dg/char4_decl.f90 | 52 ++ 5 files changed, 123 insertions(+), 12 deletions(-) diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 5d16d640322..4b570c3551a 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -7378,13 +7378,13 @@ done: /* Set string length for len=:, only. */ if (sym->ts.type == BT_CHARACTER && !sym->ts.u.cl->length) { - tmp = sym->ts.u.cl->backend_decl; + tmp2 = gfc_get_cfi_desc_elem_len (cfi); + tmp = fold_convert (TREE_TYPE (tmp2), sym->ts.u.cl->backend_decl); if (sym->ts.kind != 1) tmp = fold_build2_loc (input_location, MULT_EXPR, - gfc_array_index_type, - sym->ts.u.cl->backend_decl, tmp); - tmp2 = gfc_get_cfi_desc_elem_len (cfi); - gfc_add_modify (, tmp2, fold_convert (TREE_TYPE (tmp2), tmp)); + TREE_TYPE (tmp2), tmp, + build_int_cst (TREE_TYPE (tmp2), sym->ts.kind)); + gfc_add_modify (, tmp2, tmp); } if (!sym->attr.dimension) diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 1551a2e4df4..e7b9211f17e 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -10374,15 +10374,15 @@ gfc_conv_string_parameter (gfc_se * se) || TREE_CODE (TREE_TYPE (se->expr)) == INTEGER_TYPE) && TYPE_STRING_FLAG (TREE_TYPE (se->expr))) { + type = TREE_TYPE (se->expr); if (TREE_CODE (se->expr) != INDIRECT_REF) - { - type = TREE_TYPE (se->expr); - se->expr = gfc_build_addr_expr (build_pointer_type (type), se->expr); - } + se->expr = gfc_build_addr_expr (build_pointer_type (type), se->expr); else { - type = gfc_get_character_type_len (gfc_default_character_kind, - se->string_length); + if (TREE_CODE (type) == ARRAY_TYPE) + type = TREE_TYPE (type); + type = gfc_get_character_type_len_for_eltype (type, + se->string_length); type = build_pointer_type (type); se->expr =
[committed] gfortran.dg/c-interop/deferred-character-2.f90: Fix dg-do
Just spotted this. It did only compile instead of also run and was the only occurrence I could find for 'dg-.*execute'. Committed as https://gcc.gnu.org/r13-3306 Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 3760dd553eed21ac5614cf0d0841ca984b4361e2 Author: Tobias Burnus Date: Fri Oct 14 18:34:49 2022 +0200 gfortran.dg/c-interop/deferred-character-2.f90: Fix dg-do gcc/testsuite/ * gfortran.dg/c-interop/deferred-character-2.f90: Use 'dg-do run'. diff --git a/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90 b/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90 index 356097af241..4dab32662c6 100644 --- a/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90 +++ b/gcc/testsuite/gfortran.dg/c-interop/deferred-character-2.f90 @@ -1,5 +1,5 @@ ! PR 92482 -! { dg-do execute} +! { dg-do run } ! ! TS 29113 ! 8.7 Interoperability of procedures and procedure interfaces
[Patch] libgomp: Add Fortran testcases for omp_in_explicit_task
Rather obvious patch as it is a straight conversion from C. OK for mainline? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Add Fortran testcases for omp_in_explicit_task Fortranized testcases of commits r13-3257-ga58a965eb73 and r13-3258-g0ec4e93fb9f. libgomp/ChangeLog: * testsuite/libgomp.fortran/task-7.f90: New test. * testsuite/libgomp.fortran/task-8.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-1.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-2.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-3.f90: New test. * testsuite/libgomp.fortran/task-reduction-17.f90: New test. * testsuite/libgomp.fortran/task-reduction-18.f90: New test. libgomp/testsuite/libgomp.fortran/task-7.f90 | 22 libgomp/testsuite/libgomp.fortran/task-8.f90 | 13 +++ .../libgomp.fortran/task-in-explicit-1.f90 | 113 + .../libgomp.fortran/task-in-explicit-2.f90 | 21 .../libgomp.fortran/task-in-explicit-3.f90 | 31 ++ .../libgomp.fortran/task-reduction-17.f90 | 32 ++ .../libgomp.fortran/task-reduction-18.f90 | 15 +++ 7 files changed, 247 insertions(+) diff --git a/libgomp/testsuite/libgomp.fortran/task-7.f90 b/libgomp/testsuite/libgomp.fortran/task-7.f90 new file mode 100644 index 000..e806bd79663 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/task-7.f90 @@ -0,0 +1,22 @@ +! { dg-do run } + +program main + use omp_lib + implicit none + + !$omp task final (.true.) +if (.not. omp_in_final ()) & + error stop +!$omp task + if (.not. omp_in_final ()) & +error stop + !$omp target nowait + if (omp_in_final ()) & +error stop + !$omp end target + if (.not. omp_in_final ()) & +error stop + !$omp taskwait +!$omp end task + !$omp end task +end diff --git a/libgomp/testsuite/libgomp.fortran/task-8.f90 b/libgomp/testsuite/libgomp.fortran/task-8.f90 new file mode 100644 index 000..037c63b8fa3 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/task-8.f90 @@ -0,0 +1,13 @@ +! { dg-do run } + +program main + implicit none + integer :: i + i = 0 + !$omp task +!$omp target nowait private (i) + i = 1 +!$omp end target +!$omp taskwait + !$omp end task +end diff --git a/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90 b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90 new file mode 100644 index 000..b6fa21b2c22 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90 @@ -0,0 +1,113 @@ +! { dg-do run } + +program main + use omp_lib + implicit none + integer :: i + + if (omp_in_explicit_task ()) & +error stop + !$omp task + if (.not. omp_in_explicit_task ()) & +error stop + !$omp end task + + !$omp task final (.true.) +if (.not. omp_in_explicit_task ()) & + error stop +!$omp task +if (.not. omp_in_explicit_task ()) & + error stop +!$omp end task + !$omp end task + + !$omp parallel +if (omp_in_explicit_task ()) & + error stop +!$omp task if (.false.) +if (.not. omp_in_explicit_task ()) & + error stop +!$omp task if (.false.) + if (.not. omp_in_explicit_task ()) & +error stop +!$omp end task +!$omp end task +!$omp task final (.true.) + if (.not. omp_in_explicit_task ()) & +error stop +!$omp end task +!$omp barrier +if (omp_in_explicit_task ()) & + error stop +!$omp taskloop num_tasks (24) +do i = 1, 32 + if (.not. omp_in_explicit_task ()) & +error stop +end do +!$omp masked +!$omp task +if (.not. omp_in_explicit_task ()) & + error stop +!$omp end task +!$omp end masked +!$omp barrier +if (omp_in_explicit_task ()) & + error stop + !$omp end parallel + + !$omp target +if (omp_in_explicit_task ()) & + error stop +!$omp task if (.false.) +if (.not. omp_in_explicit_task ()) & + error stop +!$omp end task +!$omp task +if (.not. omp_in_explicit_task ()) & + error stop +!$omp end task + !$omp end target + + !$omp target teams +!$omp distribute +do i = 1, 4 + if (omp_in_explicit_task ()) then +error stop + else + !$omp parallel +if (omp_in_explicit_task ()) & + error stop +!$omp task +if (.not. omp_in_explicit_task ()) & + error stop +!$omp end task +!$omp barrier +if (omp_in_explicit_task ()) & + error stop + !$omp end parallel + end if +end do + !$omp end target teams + + !$omp teams +!$omp distribute +
Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling
On 12.10.22 19:09, Andrew Stubbs wrote: On 12/10/2022 15:29, Tobias Burnus wrote: Right, sorry, the buffer is circular, but the counter is linear. It simplified reservation that way, but it does mean that there's a limit to the number of times the buffer can cycle before the counter saturates. (You'd need to stream out gigabytes of data to hit the limit though.) Or in other words, you can have 2^32 = 4,294,967,296 (write chunks + reverse offloads) per kernel launch. ... PS: Currently, device stack variables are private and cannot be accessed from the host; this will change in a separate patch. [...] So, the patch, as is, is known to be non-functional? How can you have tested it? For the addrs_sizes_kind data to be accessible the asm("s8") has to be wrong. I have tested the non-addrs_sizes_kind part only, which permits to run reverse-offload functions just fine, but only if they do not use firstprivate or map. — And I actually also tested with the addrs_sizes_kind part but that unsurprisingly fails hard when trying to copy the stack data. I think the patch looks good, in principle. The use of the existing ring-buffer is the right way to do it, IMO. Can we get the manually allocated stacks patch in first and then follow up with these patches when they actually work? I stash this patch as: "OK – but ams still want to have a glance once __builtin_gcn_kernarg_ptr is in". I terms of having fewer *.diff files around, I of course would prefer to just change one line in a follow-up commit instead of keeping a full patch around, but holding off until __builtin_gcn_kernarg_ptr is ready + the default has changed to non-private stack variables is also fine. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling
On 29.09.22 18:24, Andrew Stubbs wrote: On 27/09/2022 14:16, Tobias Burnus wrote: Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance regarding how to actually implement it. [...] The point is that you can use the "msg" and "text" fields for whatever data you want, as long as you invent a new value for "type". [] You can make "case 4" do whatever you want. There are enough bytes for 4 pointers, and you could use multiple packets (although it's not safe to assume they're contiguous or already arrived; maybe "case 4" for part 1, "case 5" for part 2). It's possible to change this structure, of course, but the target implementation is in newlib so versioning becomes a problem. I think – also looking at the Newlib write.c implementation - that the data is contiguous: there is an atomic add, where instead of passing '1' for a single slot, I could also add '2' for two slots. Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev, it needs the generic parts of the sister nvptx patch.* 2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + 32.) As target_ext is blocking, I decided to use a stack local variable for the remaining arguments and pass it along. Alternatively, I could also use 2 slots - and process them together. This would avoid one device->host memory copy but would make console_output less clear. OK for mainline? Tobias * https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html PS: Currently, device stack variables are private and cannot be accessed from the host; this will change in a separate patch. It not only affects the "rest" part as used in this patch but also the actual arrays behind addr, kinds, and sizes. And quite likely a lot of the map/firstprivate variables passed to addr. As num_devices() will return 0 or -1, this is for now a non-issue. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp/gcn: Prepare for reverse-offload callback handling libgomp/ChangeLog: * config/gcn/libgomp-gcn.h: New file; contains struct output, declared previously in plugin-gcn.c. * config/gcn/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare as extern var. (GOMP_target_ext): Handle reverse offload. * plugin/plugin-gcn.c: Include libgomp-gcn.h. (struct kernargs): Replace struct def by the one from libgomp-gcn.h for output_data. (process_reverse_offload): New. (console_output): Call it. libgomp/config/gcn/libgomp-gcn.h | 61 libgomp/config/gcn/target.c | 44 - libgomp/plugin/plugin-gcn.c | 34 -- 3 files changed, 117 insertions(+), 22 deletions(-) diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h new file mode 100644 index 000..91560be787f --- /dev/null +++ b/libgomp/config/gcn/libgomp-gcn.h @@ -0,0 +1,61 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Tobias Burnus . + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* This file contains defines and type definitions shared between the + nvptx target's libgomp.a and the plugin-nvptx.c, but that is only + needef for this target. */ + +#ifndef LIBGOMP_GCN_H +#define LIBGOMP_GCN_H 1 + +/* This struct is also used in Newlib's libc/sys/amdgcn/write.c. */ +struct output +{ + int return_value; + unsigned int next_output; + struct printf_data { +int written; +union { + char msg[128]; + uint64_t msg_u64[2]; +}; +int type; +union { + int64_t ivalue; + double dvalue; + char text[128]; + uint64_t value_u64[2]; +
[Patch] libgomp: Add offload_device_gcn check, add requires-4a.c test
This came up because the USM implementation with -foffload-memory={unified,pinned} as posted at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html does not handle USM with static variables. This shows up for the OG12 alias devel/omp/gcc-12 branch as FAIL for requires-4.c. The attached patch prepares for skipping requires-4.c for the gcn/nvptx device and adds an adjacent requires-4a.c testcase, using heap memory, that can still run on gcn/nvptx. Additionally, I commented on no longer used #defined, following the precedence GOMP_DEVICE_HOST_NONSHM. Thus, this tests adds another testcase and one effective-target check, out-comments a unused #define - and that's it. (Otherwise, it is just a prep patch.) OK for mainline? Tobias PS: Currently, neither the preexisting offload_device_nvptx nor the new offload_device_gcn target selector is used, neither in old code nor by this patch. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Add offload_device_gcn check, add requires-4a.c test Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but with using a heap-allocated instead of static memory for a variable. This change and the added offload_device_gcn check prepare for pseudo-USM, where the device hardware cannot access all host memory but only managed and pinned memory; for those, requires-4.c will fail and the new check permits to add target { ! { offload_device_nvptx || offload_device_gcn } } to requires-4.c; however, it has not been added yet as pseuo-USM support is not yet on mainline. (Review is pending for the USM patches.) include/ChangeLog: * gomp-constants.h (GOMP_DEVICE_HSA): Comment (unused). libgomp/ChangeLog: * testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn): New. * testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn, on_device_arch_gcn): New. * testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from requires-4.c but using heap-allocated memory. include/gomp-constants.h | 2 +- libgomp/testsuite/lib/libgomp.exp | 12 +++ .../libgomp.c-c++-common/on_device_arch.h | 13 .../testsuite/libgomp.c-c++-common/requires-4a.c | 39 ++ 4 files changed, 65 insertions(+), 1 deletion(-) diff --git a/include/gomp-constants.h b/include/gomp-constants.h index 84316f953d0..fac7316b858 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -229,9 +229,9 @@ enum gomp_map_kind /* #define GOMP_DEVICE_HOST_NONSHM 3 removed. */ #define GOMP_DEVICE_NOT_HOST 4 #define GOMP_DEVICE_NVIDIA_PTX 5 #define GOMP_DEVICE_INTEL_MIC 6 -#define GOMP_DEVICE_HSA 7 +/* #define GOMP_DEVICE_HSA 7 removed. */ #define GOMP_DEVICE_GCN 8 /* We have a compatibility issue. OpenMP 5.2 introduced omp_initial_device with value of -1 which clashes with our diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index 107a3c2ac9d..4b8c64de8a5 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -414,8 +414,20 @@ proc check_effective_target_offload_device_nvptx { } { } } ] } +# Return 1 if using a GCN offload device. +proc check_effective_target_offload_device_gcn { } { +return [check_runtime_nocache offload_device_gcn { + #include + #include "testsuite/libgomp.c-c++-common/on_device_arch.h" + int main () + { + return !on_device_arch_gcn (); + } +} ] +} + # Return 1 if at least one Nvidia GPU is accessible. proc check_effective_target_openacc_nvidia_accel_present { } { return [check_runtime openacc_nvidia_accel_present { diff --git a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h index f92743b04d7..6f66dbd784c 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h +++ b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h @@ -6,15 +6,22 @@ device_arch_nvptx (void) { return GOMP_DEVICE_NVIDIA_PTX; } +/* static */ int +device_arch_gcn (void) +{ + return GOMP_DEVICE_GCN; +} + /* static */ int device_arch_intel_mic (void) { return GOMP_DEVICE_INTEL_MIC; } #pragma omp declare variant (device_arch_nvptx) match(construct={target},device={arch(nvptx)}) +#pragma omp declare variant (device_arch_gcn) match(construct={target},device={arch(gcn)}) #pragma omp declare variant (device_arch_intel_mic) match(construct={target},device={arch(intel_mic)}) /* static */ int device_arch (void) { @@ -36,8 +43,14 @@ on_device_arch_nvptx () { return on_device_arch (GOMP_DEVICE_NVIDIA_PTX); } +int +on_device_arch_gcn () +{ + return on_device_arch (GOMP_DEVICE_GCN); +} + int on_device_arch_intel_mic () { return
Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling
enum { CU_STREAM_DEFAULT = 0, @@ -169,6 +171,7 @@ CUresult cuMemGetInfo (size_t *, size_t *); CUresult cuMemAlloc (CUdeviceptr *, size_t); #define cuMemAllocHost cuMemAllocHost_v2 CUresult cuMemAllocHost (void **, size_t); +CUresult cuMemHostAlloc (void **, size_t, unsigned int); CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t); #define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2 CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream); diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c index 6f869be..eef151c 100644 --- a/libgomp/config/nvptx/icv-device.c +++ b/libgomp/config/nvptx/icv-device.c @@ -30,7 +30,7 @@ /* This is set to the ICV values of current GPU during device initialization, when the offload image containing this libgomp portion is loaded. */ -static volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS; +volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS; void omp_set_default_device (int device_num __attribute__((unused))) diff --git a/libgomp/config/nvptx/libgomp-nvptx.h b/libgomp/config/nvptx/libgomp-nvptx.h new file mode 100644 index 000..5da9aae --- /dev/null +++ b/libgomp/config/nvptx/libgomp-nvptx.h @@ -0,0 +1,51 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Tobias Burnus . + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* This file contains defines and type definitions shared between the + nvptx target's libgomp.a and the plugin-nvptx.c, but that is only + needef for this target. */ + +#ifndef LIBGOMP_NVPTX_H +#define LIBGOMP_NVPTX_H 1 + +#define GOMP_REV_OFFLOAD_VAR __gomp_rev_offload_var + +struct rev_offload { + uint64_t fn; + uint64_t mapnum; + uint64_t addrs; + uint64_t sizes; + uint64_t kinds; + int32_t dev_num; +}; + +#if (__SIZEOF_SHORT__ != 2 \ + || __SIZEOF_SIZE_T__ != 8 \ + || __SIZEOF_POINTER__ != 8) +#error "Data-type conversion required for rev_offload" +#endif + +#endif /* LIBGOMP_NVPTX_H */ + diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c index 11108d2..0e79388 100644 --- a/libgomp/config/nvptx/target.c +++ b/libgomp/config/nvptx/target.c @@ -24,9 +24,12 @@ <http://www.gnu.org/licenses/>. */ #include "libgomp.h" +#include "libgomp-nvptx.h" /* For struct rev_offload + GOMP_REV_OFFLOAD_VAR. */ #include extern int __gomp_team_num __attribute__((shared)); +extern volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS; +volatile struct rev_offload *GOMP_REV_OFFLOAD_VAR; bool GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, @@ -88,16 +91,53 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, void **hostaddrs, size_t *sizes, unsigned short *kinds, unsigned int flags, void **depend, void **args) { - (void) device; - (void) fn; - (void) mapnum; - (void) hostaddrs; - (void) sizes; - (void) kinds; + static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (); */ (void) flags; (void) depend; (void) args; - __builtin_unreachable (); + + if (device != GOMP_DEVICE_HOST_FALLBACK + || fn == NULL + || GOMP_REV_OFFLOAD_VAR == NULL) +return; + + gomp_mutex_lock (); + + GOMP_REV_OFFLOAD_VAR->mapnum = mapnum; + GOMP_REV_OFFLOAD_VAR->addrs = (uint64_t) hostaddrs; + GOMP_REV_OFFLOAD_VAR->sizes = (uint64_t) sizes; + GOMP_REV_OFFLOAD_VAR->kinds = (uint64_t) kinds; + GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num; + + /* Set 'fn' to trigger processing on the host; wait for completion, + which is flagged by setting 'fn' back to 0 on the host. */ + uint64_t addr_struct_fn = (uint64_t) _REV_OFFLOAD_VAR->fn; +#if __PTX_SM__ >= 700 + asm volatile ("st.global.release.sys.u64 [%0], %1;" + : : "r"(addr_struct_fn), "r" (fn) : "memory"); +#else + __sync_synchronize (); /* membar.sys */ + asm volatile ("st.volatile.global.u64 [%0], %1;" +