On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote: > gcc/ChangeLog: > > * gimplify.cc (optimize_target_teams): Set initial num_teams_upper > to "-2" instead of "1" for non-existing num_teams clause in order to > disambiguate from the case of an existing num_teams clause with value 1. > > libgomp/ChangeLog: > > * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to > allow processing of device-specific values. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * icv-device.c (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > (omp_set_teams_thread_limit): Likewise. > * icv.c (omp_set_teams_thread_limit): Removed. > (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var > handling. > (gomp_load_image_to_device): Added a size check for the ICVs struct > variable. > (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to > copy back the ICV values from device to host. > (GOMP_target_ext): Update the number of teams and threads in the kernel > args also considering device-specific values. > * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
Better say what exactly you changed in words. > * testsuite/libgomp.c-c++-common/icv-5.c: Extended. > * testsuite/libgomp.c-c++-common/icv-6.c: Extended. > * testsuite/libgomp.c-c++-common/icv-7.c: Extended. > * testsuite/libgomp.c-c++-common/icv-9.c: New test. > * testsuite/libgomp.fortran/icv-5.f90: New test. > * testsuite/libgomp.fortran/icv-6.f90: New test. > > gcc/testsuite/ChangeLog: > > * c-c++-common/gomp/target-teams-1.c: Adapt expected values for > num_teams from "1" to "-2" in cases without num_teams clause. > * g++.dg/gomp/target-teams-1.C: Likewise. > * gfortran.dg/gomp/defaultmap-4.f90: Likewise. > * gfortran.dg/gomp/defaultmap-5.f90: Likewise. > * gfortran.dg/gomp/defaultmap-6.f90: Likewise. > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) > struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > > if (teams == NULL_TREE) > - num_teams_upper = integer_one_node; > + num_teams_upper = build_int_cst (integer_type_node, -2); > else > for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) > { The function comment above optimize_target_teams contains detailed description on what the values mean and why, so it definitely should document what -2 means and when it is used. I know you have documentation in libgomp for it, but it should be in both places. > + intptr_t new_teams = orig_teams, new_threads = orig_threads; > + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1. Two spaces after . > + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but > the > + value could not be specified. No Change. Likewise. lowercase change ? > + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause. > + Set device-specific value. > + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause. > + No change. */ > + if (orig_teams == -2) > + new_teams = 1; > + else if (orig_teams == 0) > + { > + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item > (device); > + if (item != NULL) > + new_teams = item->icvs.nteams; > + } > + /* The device-specific teams-thread-limit is only set if (a) an explicit > TEAMS > + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already > set by > + e.g. a THREAD_LIMIT clause. */ > + if (orig_teams >= -2 && orig_threads == 0) The comment talks about ORIG_TEAMS > -2, but the condition is >= -2. So which one is it? > + /* This tests a large number of teams and threads. If it is larger than > + 2^15+1 then the according argument in the kernels arguments list > + is encoded with two items instead of one. On NVIDIA there is an > + adjustment for too large teams and threads. For AMD such adjustment > + exists only for threads and will cause runtime errors with a two large s/two/too/ ? Shouldn't amdgcn adjusts also number of teams? As for testcases, have you tested this in a native setup where dg-set-target-env-var actually works? Jakub