On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>       * gimplify.cc (optimize_target_teams): Set initial num_teams_upper
>       to "-2" instead of "1" for non-existing num_teams clause in order to
>       disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
>       * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
>       allow processing of device-specific values.
>       (omp_set_teams_thread_limit): Likewise.
>       (ialias): Likewise.
>       * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
>       (omp_set_teams_thread_limit): Likewise.
>       (ialias): Likewise.
>       * icv-device.c (omp_get_teams_thread_limit): Likewise.
>       (ialias): Likewise.
>       (omp_set_teams_thread_limit): Likewise.
>       * icv.c (omp_set_teams_thread_limit): Removed.
>       (omp_get_teams_thread_limit): Likewise.
>       (ialias): Likewise.
>       * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
>       handling.
>       (gomp_load_image_to_device): Added a size check for the ICVs struct
>       variable.
>       (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
>       copy back the ICV values from device to host.
>       (GOMP_target_ext): Update the number of teams and threads in the kernel
>       args also considering device-specific values.
>       * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.

>       * testsuite/libgomp.c-c++-common/icv-5.c: Extended.
>       * testsuite/libgomp.c-c++-common/icv-6.c: Extended.
>       * testsuite/libgomp.c-c++-common/icv-7.c: Extended.
>       * testsuite/libgomp.c-c++-common/icv-9.c: New test.
>       * testsuite/libgomp.fortran/icv-5.f90: New test.
>       * testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
>       * c-c++-common/gomp/target-teams-1.c: Adapt expected values for
>       num_teams from "1" to "-2" in cases without num_teams clause.
>       * g++.dg/gomp/target-teams-1.C: Likewise.
>       * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
>       * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
>       * gfortran.dg/gomp/defaultmap-6.f90: Likewise.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>    struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>  
>    if (teams == NULL_TREE)
> -    num_teams_upper = integer_one_node;
> +    num_teams_upper = build_int_cst (integer_type_node, -2);
>    else
>      for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>        {

The function comment above optimize_target_teams contains detailed
description on what the values mean and why, so it definitely should
document what -2 means and when it is used.
I know you have documentation in libgomp for it, but it should be in both
places.

> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .

> +     ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but 
> the
> +                    value could not be specified. No Change.

Likewise.
lowercase change ?

> +     ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> +                   Set device-specific value.
> +     ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +                  No change.  */
> +  if (orig_teams == -2)
> +    new_teams = 1;
> +  else if (orig_teams == 0)
> +    {
> +      struct gomp_offload_icv_list *item = gomp_get_offload_icv_item 
> (device);
> +      if (item != NULL)
> +     new_teams = item->icvs.nteams;
> +    }
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit 
> TEAMS
> +     region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already 
> set by
> +     e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?

> +      /* This tests a large number of teams and threads. If it is larger than
> +      2^15+1 then the according argument in the kernels arguments list
> +      is encoded with two items instead of one. On NVIDIA there is an
> +      adjustment for too large teams and threads. For AMD such adjustment
> +      exists only for threads and will cause runtime errors with a two large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?

As for testcases, have you tested this in a native setup where 
dg-set-target-env-var
actually works?

        Jakub

Reply via email to