On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote:
> gcc/ChangeLog:
>
> * gimplify.cc (optimize_target_teams): Set initial num_teams_upper
> to "-2" instead of "1" for non-existing num_teams clause in order to
> disambiguate from the case of an existing num_teams clause with value 1.
>
> libgomp/ChangeLog:
>
> * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
> allow processing of device-specific values.
> (omp_set_teams_thread_limit): Likewise.
> (ialias): Likewise.
> * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
> (omp_set_teams_thread_limit): Likewise.
> (ialias): Likewise.
> * icv-device.c (omp_get_teams_thread_limit): Likewise.
> (ialias): Likewise.
> (omp_set_teams_thread_limit): Likewise.
> * icv.c (omp_set_teams_thread_limit): Removed.
> (omp_get_teams_thread_limit): Likewise.
> (ialias): Likewise.
> * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
> handling.
> (gomp_load_image_to_device): Added a size check for the ICVs struct
> variable.
> (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
> copy back the ICV values from device to host.
> (GOMP_target_ext): Update the number of teams and threads in the kernel
> args also considering device-specific values.
> * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
Better say what exactly you changed in words.
> * testsuite/libgomp.c-c++-common/icv-5.c: Extended.
> * testsuite/libgomp.c-c++-common/icv-6.c: Extended.
> * testsuite/libgomp.c-c++-common/icv-7.c: Extended.
> * testsuite/libgomp.c-c++-common/icv-9.c: New test.
> * testsuite/libgomp.fortran/icv-5.f90: New test.
> * testsuite/libgomp.fortran/icv-6.f90: New test.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/gomp/target-teams-1.c: Adapt expected values for
> num_teams from "1" to "-2" in cases without num_teams clause.
> * g++.dg/gomp/target-teams-1.C: Likewise.
> * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
> * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
> * gfortran.dg/gomp/defaultmap-6.f90: Likewise.
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
> struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>
> if (teams == NULL_TREE)
> - num_teams_upper = integer_one_node;
> + num_teams_upper = build_int_cst (integer_type_node, -2);
> else
> for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
> {
The function comment above optimize_target_teams contains detailed
description on what the values mean and why, so it definitely should
document what -2 means and when it is used.
I know you have documentation in libgomp for it, but it should be in both
places.
> + intptr_t new_teams = orig_teams, new_threads = orig_threads;
> + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.
Two spaces after .
> + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but
> the
> + value could not be specified. No Change.
Likewise.
lowercase change ?
> + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> + Set device-specific value.
> + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> + No change. */
> + if (orig_teams == -2)
> + new_teams = 1;
> + else if (orig_teams == 0)
> + {
> + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item
> (device);
> + if (item != NULL)
> + new_teams = item->icvs.nteams;
> + }
> + /* The device-specific teams-thread-limit is only set if (a) an explicit
> TEAMS
> + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already
> set by
> + e.g. a THREAD_LIMIT clause. */
> + if (orig_teams >= -2 && orig_threads == 0)
The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?
> + /* This tests a large number of teams and threads. If it is larger than
> + 2^15+1 then the according argument in the kernels arguments list
> + is encoded with two items instead of one. On NVIDIA there is an
> + adjustment for too large teams and threads. For AMD such adjustment
> + exists only for threads and will cause runtime errors with a two large
s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?
As for testcases, have you tested this in a native setup where
dg-set-target-env-var
actually works?
Jakub