Hi Frederik!

On 2019-11-07T10:52:13+0100, Frederik Harwath <frede...@codesourcery.com> wrote:
> this patch implements the OpenACC 2.6 "serial" construct.

Thanks for taking on that one.

> It has been tested by running the testsuite with nvptx-none
> offloading on x86_64-pc-linux-gnu.

This is OK for trunk with the attached "incremental, into Add OpenACC 2.6
`serial' construct support" merged in.  (No need to re-test; I've just
done that.)  In the incremental patch, I'm streamlining some code,
format/handle 'serial' the same as existing 'parallel', etc., plus a few
more things, see my comments in the patch review below.

To record the review effort, please include "Reviewed-by: Thomas Schwinge
<tho...@codesourcery.com>" in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


I'm working on an additional patch to handle 'serial' in more cases where
it's wrong to diverge from 'parallel' (this tells us: a lot of testsuite
coverage is missing...), etc.

Thus I'm adding a lot of testsuite coverage.  I'm not asking you to work
on that, as that's not a feasible task for someone who's still new to all
this, to figure out the appropriate tests that should be
augmented/duplicated for 'serial'.  And, coming up with a list for you to
work though, I suppose would be more time consuming for me instead of
just doing it myself.  ;-)

However, you're of course always encouraged to learn from reading such
patches, and ask questions for any things unclear, of course.


>     The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
>     is equivalent to a `parallel' construct with clauses `num_gangs(1)
>      num_workers(1) vector_length(1)' implied.

..., and that's how it -- basically -- is implemented, and thus every
usage of 'serial' gets an annoying 'warning: using vector_length (32),
ignoring 1' for nvptx offloading compilation.  I wonder if we should
sinply disable that nvptx back end warning when an 'oacc serial'
attribute is present?  Or, if we should not, to highlight the issue that
I recently filed <https://github.com/OpenACC/openacc-spec/issues/238>
"OpenACC 'serial' construct might not actually be serial", discovered
during this review process?  (Summary: by GCC have a default of
'vector_length (32)', we do get vector parallelism with 'loop vector', or
'routine vector' inside 'serial' regions -- not clear if that's
intentional, and/or correct.)

>     These clauses are therefore not supported with the `serial'
>     construct. All the remaining clauses accepted with `parallel' are also
>     accepted with `serial'.
>
>     The `serial' construct is implemented like `parallel', except for
>     hardcoding dimensions rather than taking them from the relevant
>     clauses, in `expand_omp_target'.

>     Separate codes are used to denote the `serial' construct throughout the
>     middle end, even though the mapping of `serial' to an equivalent
>     `parallel' construct could have been done in the individual language
>     frontends.

Yeah, I'd pointed this out early on, and I still wonder if early
translating 'serial' into 'parallel num_gangs (1) num_workers (1)
vector_length (1)' (if that's really just what it is) would be better?
Would save quite some effort (duplicate all 'parallel' handling for
'serial').  On the other hand, we'd then need a different mechanism for:

>     In particular, this allows to distinguish between `parallel'
>     and `serial' in warnings, error messages, dumps etc.

... that (or just say "compute construct" instead of 'parallel',
'kernels', 'serial').  But we'll eventually want such a more general
mechnisma anyway; <https://gcc.gnu.org/PR65095> "Adapt OpenMP diagnostic
messages for OpenACC".

So -- we've now got that implementation, and we can still clean it up
later on.


>       * omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter.

Not anymore.


>  create mode 100644 gcc/testsuite/gfortran.dg/goacc/serial-dims.f90

>  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
>  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90

Thanks to you (and/or Tobias, I suppose) for adding some Fortran
testsuite coversage, because:

> --- a/gcc/fortran/parse.c
> +++ b/gcc/fortran/parse.c
> @@ -683,6 +683,9 @@ decode_oacc_directive (void)
>        matcha ("end parallel loop", gfc_match_omp_eos_error,
>             ST_OACC_END_PARALLEL_LOOP);
>        matcha ("end parallel", gfc_match_omp_eos_error, ST_OACC_END_PARALLEL);
> +      matcha ("end serial loop", gfc_match_omp_eos_error,
> +           ST_OACC_END_SERIAL_LOOP);
> +      matcha ("end serial", gfc_match_omp_eos_error, ST_OACC_END_SERIAL);
>        matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
>        matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
>        break;

Wow, wow.  I see this has not been present in the og8 and og9 commits of
the OpenACC 'serial' changes.  This tells us: the OpenACC 'serial'
construct has *not at all* been tested with Fortran; any compilation
attempt would've stopped early in the front end:

       25 |   !$acc end serial loop
          |         1
    Error: Unclassifiable OpenACC directive at (1)

       28 |   !$acc end serial
          |         1
    Error: Unclassifiable OpenACC directive at (1)

Thanks for fixing that.


> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h

> @@ -182,6 +182,7 @@ enum gf_mask {
>      GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
>      GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
>      GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
> +    GF_OMP_TARGET_KIND_OACC_SERIAL = 12,

That's not wrong, but I've still moved 'GF_OMP_TARGET_KIND_OACC_SERIAL'
next to/after the existing 'GF_OMP_TARGET_KIND_OACC_PARALLEL',
'GF_OMP_TARGET_KIND_OACC_KERNELS' (it's OK to renumber 'enum gf_mask'
items), so that there's (at least some) consistency in the the
'parallel', 'kernels', 'serial' ordering (which is the order they appear
in the current specification), that we shall use unless alphabetical
ordering is used.


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -7518,7 +7526,7 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree 
> clauses,
>  
>    /* In a parallel region, loops are implicitly INDEPENDENT.  */
>    omp_context *tgt = enclosing_target_ctx (ctx);
> -  if (!tgt || is_oacc_parallel (tgt))
> +  if (!tgt || is_oacc_parallel_or_serial (tgt))
>      tag |= OLF_INDEPENDENT;

I would agree, but from a (very) quick look, I don't think the OpenACC
specification actually says anything on that topic.  Something I'll get
that clarified.


> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/serial-dims.c
> @@ -0,0 +1,12 @@
> +/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
> +   num_workers, vector_length with the serial construct.  */
> +
> +void f(void)
> +{
> +#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid for 
> '#pragma acc serial'" } */
> +  ;
> +#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not valid 
> for '#pragma acc serial'" } */
> +  ;
> +#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not 
> valid for '#pragma acc serial'" } */
> +  ;
> +}

I've merged that into the existing 'c-c++-common/goacc/parallel-dims-2.c'.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
> @@ -0,0 +1,40 @@
> +! Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
> +! num_workers, vector_length with the serial construct.
> +
> +subroutine s()
> +  integer :: i
> +  !$acc parallel
> +  !$acc end parallel
> +
> +  !$acc parallel loop
> +  do i = 1, 5
> +  end do
> +
> +  !$acc parallel loop
> +  do i = 1, 5
> +  end do
> +  !$acc end parallel loop
> +
> +  !$acc serial loop
> +  do i = 1, 5
> +  end do
> +
> +  !$acc serial loop
> +  do i = 1, 5
> +  end do
> +  !$acc end serial loop
> +
> +  !$acc serial
> +  !$acc end serial
> +end subroutine s
> +
> +subroutine f()
> +!$acc serial num_gangs (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +
> +!$acc serial num_workers (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +
> +!$acc serial vector_length (1)  ! { dg-error "Failed to match clause at" }
> +!$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
> +end subroutine f

Similarly, for symmetry, moved into (new)
'gfortran.dg/goacc/parallel-dims-2.f90'.


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
> @@ -0,0 +1,92 @@
> +/* OpenACC dimensions with the serial construct.  */

This I've merged into the existing
'libgomp.oacc-c-c++-common/parallel-dims.c', instead of duplicating
infrastructure here, and doing some things slightly differently (possibly
due to incorrect divergence between 'serial' and 'parallel' handling, as
I mentioned above, which I shall soon fix).

> +  /* Serial OpenACC constructs must get launched as 1 x 1 x 1.  */
> +  {
> +    int gangs_min, gangs_max;
> +    int workers_min, workers_max;
> +    int vectors_min, vectors_max;
> +    int gangs_actual, workers_actual, vectors_actual;
> +    int i, j, k;
> +
> +    gangs_min = workers_min = vectors_min = INT_MAX;
> +    gangs_max = workers_max = vectors_max = INT_MIN;
> +    gangs_actual = workers_actual = vectors_actual = 1;
> +#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 
> 1" "" { target openacc_nvidia_accel_selected } } */
> +    {
> +      if (acc_on_device (acc_device_nvidia))
> +     {
> +       /* The GCC nvptx back end enforces vector_length (32).  */
> +       vectors_actual = 32;

So, that's actually a good question, whether that is permissible --
that's <https://github.com/OpenACC/openacc-spec/issues/238> "OpenACC
'serial' construct might not actually be serial", as mentioned above.

> +     }
> +      else if (!acc_on_device (acc_device_host))
> +     __builtin_abort ();
> +#pragma acc loop gang \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +      for (i = 100 * gangs_actual; i > -100 * gangs_actual; i--)
> +#pragma acc loop worker \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +     for (j = 100 * workers_actual; j > -100 * workers_actual; j--)
> +#pragma acc loop vector \
> +  reduction (min: gangs_min, workers_min, vectors_min) \
> +  reduction (max: gangs_max, workers_max, vectors_max)
> +       for (k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
> +         {
> +           gangs_min = gangs_max = acc_gang ();
> +           workers_min = workers_max = acc_worker ();
> +           vectors_min = vectors_max = acc_vector ();
> +         }
> +      if (gangs_min != 0 || gangs_max != gangs_actual - 1
> +       || workers_min != 0 || workers_max != workers_actual - 1
> +       || vectors_min != 0 || vectors_max != vectors_actual - 1)
> +     __builtin_abort ();
> +    }
> +  }

Per the OpenACC 'loop' directives specified here, that's testing
gang-partitioned, worker-partitioned, vector-partitioned execution mode.

We should also test gang-redundant, worker-single, vector-single
execution mode, which I've added.


The Fortran counter part (thanks for creating that!), I suppose, had not
yet been tested?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
> @@ -0,0 +1,41 @@
> +/* OpenACC dimensions with the serial construct.  */
> +/* Used by serial-dims.f90.  */

I indicate where this has been copied from.

(Generally, getting rid of these wrapper functions is for another day.)

> +#include <limits.h>
> +#include <openacc.h>
> +#include <gomp-constants.h>
> +
> +/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
> +   not behaving as expected for -O0.  */
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
> +  else
> +    __builtin_abort ();
> +}
> +
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
> +  else
> +    __builtin_abort ();
> +}
> +
> +#pragma acc routine seq
> +static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
> +{
> +  if (acc_on_device ((int) acc_device_host))
> +    return 0;
> +  else if (acc_on_device ((int) acc_device_nvidia))
> +    return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
> +  else
> +    __builtin_abort ();
> +}

Compilation of 'libgomp.oacc-fortran/serial-dims.f90' fails:

    serial-dims.f90:(.text+0x124): undefined reference to `acc_gang'
    serial-dims.f90:(.text+0x130): undefined reference to `acc_gang'
    serial-dims.f90:(.text+0x13c): undefined reference to `acc_worker'
    serial-dims.f90:(.text+0x148): undefined reference to `acc_worker'
    serial-dims.f90:(.text+0x154): undefined reference to `acc_vector'
    serial-dims.f90:(.text+0x160): undefined reference to `acc_vector'

Have to remove 'static' from 'acc_gang', 'acc_worker', 'acc_vector'.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
> @@ -0,0 +1,89 @@
> +! OpenACC dimensions with the serial construct.

This needs '{ dg-do run }' for torture testing.

> +
> +! { dg-additional-sources serial-dims-aux.c }
> +! { dg-warning "command line option '-fintrinsic-modules-path=.*' is valid 
> for Fortran but not for C" }

We get:

    FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O   (test for warnings, line 4)
    FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O  (test for excess errors)

..., with:

    Excess errors:
    cc1: warning: command-line option '-fintrinsic-modules-path=[...]' is valid 
for Fortran but not for C

That's because that diagnostic doesn't appear on the line where the
'dg-warning' directive is present (line 4).  I changed that to
'dg-prune-output', but I wonder if there's a better way, so that we can
specify to expect/match a diagnostic without line number information -- I
can't remember whether such a thing exists.

However, that still fails: "command[-]line option" typo.  ;-)

> +module acc_routines
> +  implicit none (type, external)
> +
> +  interface
> +    integer function acc_gang() bind(C)
> +      !$acc routine seq
> +    end function acc_gang
> +
> +    integer function acc_worker() bind(C)
> +      !$acc routine seq
> +    end function acc_worker
> +
> +    integer function acc_vector() bind(C)
> +      !$acc routine seq
> +    end function acc_vector
> +  end interface
> +end module acc_routines

With '-Wall', we're told:

       14 |     integer function acc_gang() bind(C)
          |                             1
    Warning: Variable 'acc_gang' at (1) may not be a C interoperable kind but 
it is BIND(C) [-Wc-binding-type]

       22 |     integer function acc_vector() bind(C)
          |                               1
    Warning: Variable 'acc_vector' at (1) may not be a C interoperable kind but 
it is BIND(C) [-Wc-binding-type]

       18 |     integer function acc_worker() bind(C)
          |                               1
    Warning: Variable 'acc_worker' at (1) may not be a C interoperable kind but 
it is BIND(C) [-Wc-binding-type]

I have not yet looked into that.

> +program main
> +  use iso_c_binding
> +  use openacc
> +  use acc_routines
> +  implicit none (type, external)
> +
> +  integer :: gangs_min, gangs_max
> +  integer :: workers_min, workers_max
> +  integer :: vectors_min, vectors_max
> +  integer :: gangs_actual, workers_actual, vectors_actual
> +  integer :: i, j, k
> +
> +  call acc_init (acc_device_default)
> +
> +  ! Serial OpenACC constructs must get launched as 1 x 1 x 1.
> +  gangs_min = huge(gangs_min)
> +  workers_min = huge(workers_min)
> +  vectors_min = huge(vectors_min)
> +  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
> +  workers_max = -huge(gangs_max) - 1
> +  vectors_max = -huge(gangs_max) - 1

Indeed the C/C++ initializes '*_min' variables with 'INT_MAX', and
'*_max' variables with 'INT_MIN'.  Is the above the generic Fortran
counter part for that?

> +  gangs_actual = 1
> +  workers_actual = 1
> +  vectors_actual = 1
> +
> +  !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" 
> { target openacc_nvidia_accel_selected } }
> +    if (acc_on_device (acc_device_nvidia)) then
> +      ! The GCC nvptx back end enforces vector_length (32).
> +      vectors_actual = 32
> +    elseif (acc_on_device (acc_device_gcn)) then
> +      ! AMD GCN relies on the autovectorizer for the vector dimension:
> +      ! the loop below isn't likely to be vectorized, so vectors_actual
> +      ! is effectively 1.
> +      vectors_actual = 1

We're told:

    [...]/libgomp.oacc-fortran/serial-dims.f90:53:41: Error: Symbol 
'acc_device_gcn' at (1) has no IMPLICIT type; did you mean 'acc_device_kind'?

AMD GCN offloading support doesn't exist on trunk yet, so removed that
here.

> +    elseif (.not. acc_on_device (acc_device_host)) then
> +      stop 1
> +    end if
> +
> +!$acc loop gang &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +    do i = 100 * gangs_actual, -99 * gangs_actual, -1
> +!$acc loop worker &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +      do j = 100 * workers_actual, -99 * workers_actual, -1
> +!$acc loop vector &
> +!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
> +!$acc & reduction (max: gangs_max, workers_max, vectors_max)
> +        do k = 100 * vectors_actual, -99 * vectors_actual, -1
> +          gangs_min = acc_gang ();
> +          gangs_max = acc_gang ();
> +          workers_min = acc_worker ();
> +          workers_max = acc_worker ();
> +          vectors_min = acc_vector ();
> +          vectors_max = acc_vector ();
> +       end do
> +     end do
> +   end do
> +  if (gangs_min /= 0 .or. gangs_max /= gangs_actual - 1 &
> +      .or. workers_min /= 0 .or. workers_max /= workers_actual - 1 &
> +      .or. vectors_min /= 0 .or. vectors_max /= vectors_actual - 1) &
> +    stop 2
> +!$acc end serial
> +
> +end program main


Grüße
 Thomas


From 788b2ec11009e4c36b28834914cb251134c3b761 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Sun, 10 Nov 2019 22:33:43 +0100
Subject: [PATCH] incremental, into Add OpenACC 2.6 `serial' construct support

---
 gcc/fortran/match.h                           |   2 +-
 gcc/fortran/openmp.c                          |  16 +--
 gcc/fortran/parse.c                           |   2 +-
 gcc/fortran/trans-openmp.c                    |   8 +-
 gcc/gimple.def                                |   2 +-
 gcc/gimple.h                                  |  12 +-
 gcc/gimplify.c                                |   4 +-
 gcc/omp-expand.c                              |   8 +-
 gcc/omp-low.c                                 |   7 +-
 .../c-c++-common/goacc/parallel-dims-2.c      |  16 ++-
 .../c-c++-common/goacc/serial-dims.c          |  12 --
 .../{serial-dims.f90 => parallel-dims-2.f90}  |  34 ++---
 gcc/tree.h                                    |   3 +-
 .../libgomp.oacc-c-c++-common/parallel-dims.c |  73 +++++++++++
 .../libgomp.oacc-c-c++-common/serial-dims.c   |  92 --------------
 ...{serial-dims-aux.c => parallel-dims-aux.c} |  14 +-
 .../libgomp.oacc-fortran/parallel-dims.f90    | 120 ++++++++++++++++++
 .../libgomp.oacc-fortran/serial-dims.f90      |  89 -------------
 18 files changed, 257 insertions(+), 257 deletions(-)
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/serial-dims.c
 rename gcc/testsuite/gfortran.dg/goacc/{serial-dims.f90 => parallel-dims-2.f90} (50%)
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
 rename libgomp/testsuite/libgomp.oacc-fortran/{serial-dims-aux.c => parallel-dims-aux.c} (67%)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
 delete mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90

diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 954af72f0e07..7f3d356cbe49 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -146,9 +146,9 @@ match gfc_match_oacc_kernels (void);
 match gfc_match_oacc_kernels_loop (void);
 match gfc_match_oacc_parallel (void);
 match gfc_match_oacc_parallel_loop (void);
-match gfc_match_oacc_enter_data (void);
 match gfc_match_oacc_serial (void);
 match gfc_match_oacc_serial_loop (void);
+match gfc_match_oacc_enter_data (void);
 match gfc_match_oacc_exit_data (void);
 match gfc_match_oacc_routine (void);
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 198facce636d..dc0521b40f0b 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1965,14 +1965,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
    | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_DEFAULT	      \
    | OMP_CLAUSE_WAIT)
 #define OACC_SERIAL_CLAUSES \
-  (omp_mask (OMP_CLAUSE_ASYNC) | OMP_CLAUSE_WAIT			      \
-   | OMP_CLAUSE_IF							      \
-   | OMP_CLAUSE_REDUCTION						      \
+  (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_ASYNC | OMP_CLAUSE_REDUCTION	      \
    | OMP_CLAUSE_COPY | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT		      \
-   | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT				      \
-   | OMP_CLAUSE_DEVICEPTR						      \
-   | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_FIRSTPRIVATE			      \
-   | OMP_CLAUSE_DEFAULT)
+   | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_DEVICEPTR	      \
+   | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_FIRSTPRIVATE | OMP_CLAUSE_DEFAULT	      \
+   | OMP_CLAUSE_WAIT)
 #define OACC_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR  | OMP_CLAUSE_COPY	      \
    | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE		      \
@@ -1986,6 +1983,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
   (OACC_LOOP_CLAUSES | OACC_PARALLEL_CLAUSES)
 #define OACC_KERNELS_LOOP_CLAUSES \
   (OACC_LOOP_CLAUSES | OACC_KERNELS_CLAUSES)
+#define OACC_SERIAL_LOOP_CLAUSES \
+  (OACC_LOOP_CLAUSES | OACC_SERIAL_CLAUSES)
 #define OACC_HOST_DATA_CLAUSES omp_mask (OMP_CLAUSE_USE_DEVICE)
 #define OACC_DECLARE_CLAUSES \
   (omp_mask (OMP_CLAUSE_COPY) | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT	      \
@@ -2050,8 +2049,7 @@ gfc_match_oacc_kernels (void)
 match
 gfc_match_oacc_serial_loop (void)
 {
-  return match_acc (EXEC_OACC_SERIAL_LOOP,
-		    OACC_SERIAL_CLAUSES | OACC_LOOP_CLAUSES);
+  return match_acc (EXEC_OACC_SERIAL_LOOP, OACC_SERIAL_LOOP_CLAUSES);
 }
 
 
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 1a38606682ca..e44cc6971983 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5119,7 +5119,7 @@ parse_oacc_structured_block (gfc_statement acc_st)
   pop_state ();
 }
 
-/* Parse the statements of OpenACC loop/parallel loop/kernels loop.  */
+/* Parse the statements of OpenACC 'loop', or combined compute 'loop'.  */
 
 static gfc_statement
 parse_oacc_loop (gfc_statement acc_st)
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 573b55b066f3..d9dfcabc65ef 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3193,8 +3193,9 @@ gfc_trans_omp_code (gfc_code *code, bool force_empty)
   return stmt;
 }
 
-/* Trans OpenACC directives. */
-/* parallel, serial, kernels, data and host_data. */
+/* Translate OpenACC 'parallel', 'kernels', 'serial', 'data', 'host_data'
+   construct. */
+
 static tree
 gfc_trans_oacc_construct (gfc_code *code)
 {
@@ -4020,7 +4021,8 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
   return gfc_finish_block (&block);
 }
 
-/* Combined OpenACC parallel loop, kernels loop and serial loop. */
+/* Translate combined OpenACC 'parallel loop', 'kernels loop', 'serial loop'
+   construct. */
 
 static tree
 gfc_trans_oacc_combined_directive (gfc_code *code)
diff --git a/gcc/gimple.def b/gcc/gimple.def
index dd64419e8eb6..38c11f41156d 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -359,7 +359,7 @@ DEFGSCODE(GIMPLE_OMP_SECTIONS_SWITCH, "gimple_omp_sections_switch", GSS_BASE)
 DEFGSCODE(GIMPLE_OMP_SINGLE, "gimple_omp_single", GSS_OMP_SINGLE_LAYOUT)
 
 /* GIMPLE_OMP_TARGET <BODY, CLAUSES, CHILD_FN> represents
-   #pragma acc {kernels,parallel,data,enter data,exit data,update}
+   #pragma acc {kernels,parallel,serial,data,enter data,exit data,update}
    #pragma omp target {,data,update}
    BODY is the sequence of statements inside the construct
    (NULL for some variants).
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 83a449be3643..5a190b1714dc 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -177,12 +177,12 @@ enum gf_mask {
     GF_OMP_TARGET_KIND_EXIT_DATA = 4,
     GF_OMP_TARGET_KIND_OACC_PARALLEL = 5,
     GF_OMP_TARGET_KIND_OACC_KERNELS = 6,
-    GF_OMP_TARGET_KIND_OACC_DATA = 7,
-    GF_OMP_TARGET_KIND_OACC_UPDATE = 8,
-    GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
-    GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
-    GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
-    GF_OMP_TARGET_KIND_OACC_SERIAL = 12,
+    GF_OMP_TARGET_KIND_OACC_SERIAL = 7,
+    GF_OMP_TARGET_KIND_OACC_DATA = 8,
+    GF_OMP_TARGET_KIND_OACC_UPDATE = 9,
+    GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
+    GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
+    GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
     GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
     GF_OMP_TEAMS_HOST		= 1 << 1,
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 31429d5ac3ba..87a640545141 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -161,7 +161,7 @@ enum omp_region_type
   ORT_ACC_DATA	= ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
   ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
   ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 2,  /* Kernels construct.  */
-  ORT_ACC_SERIAL = ORT_ACC | ORT_TARGET | 4,  /* Serial construct.  */
+  ORT_ACC_SERIAL   = ORT_ACC | ORT_TARGET | 4,  /* Serial construct.  */
   ORT_ACC_HOST_DATA = ORT_ACC | ORT_TARGET_DATA | 2,  /* Host data.  */
 
   /* Dummy OpenMP region, used to disable expansion of
@@ -10101,7 +10101,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	      break;
 	    }
 	  decl = OMP_CLAUSE_DECL (c);
-	  /* Data clauses associated with acc parallel reductions must be
+	  /* Data clauses associated with reductions must be
 	     compatible with present_or_copy.  Warn and adjust the clause
 	     if that is not the case.  */
 	  if (ctx->region_type == ORT_ACC_PARALLEL
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index d242f4e1ae99..6f945011cf5a 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7914,8 +7914,8 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-    case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
@@ -8171,8 +8171,8 @@ expand_omp_target (struct omp_region *region)
       start_ix = BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA;
       flags_i |= GOMP_TARGET_FLAG_EXIT_DATA;
       break;
-    case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
@@ -8938,8 +8938,8 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 		{
 		case GF_OMP_TARGET_KIND_REGION:
 		case GF_OMP_TARGET_KIND_DATA:
-		case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+		case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		case GF_OMP_TARGET_KIND_OACC_SERIAL:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
 		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
@@ -9193,8 +9193,8 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
 	{
 	case GF_OMP_TARGET_KIND_REGION:
 	case GF_OMP_TARGET_KIND_DATA:
-	case GF_OMP_TARGET_KIND_OACC_KERNELS:
 	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+	case GF_OMP_TARGET_KIND_OACC_KERNELS:
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index abf63334ca05..781e7cbf27a2 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -185,7 +185,8 @@ static tree scan_omp_1_op (tree *, int *, void *);
       *handled_ops_p = false; \
       break;
 
-/* Return true if CTX corresponds to an oacc parallel or serial region.  */
+/* Return true if CTX corresponds to an OpenACC 'parallel' or 'serial'
+   region.  */
 
 static bool
 is_oacc_parallel_or_serial (omp_context *ctx)
@@ -2419,7 +2420,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 	    if (check && OMP_CLAUSE_OPERAND (c, 0))
 	      error_at (gimple_location (stmt),
 			"argument not permitted on %qs clause in"
-			" OpenACC %<parallel%>", check);
+			" OpenACC %<parallel%> or %<serial%>", check);
 	  }
 
       if (tgt && is_oacc_kernels (tgt))
@@ -11498,7 +11499,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	/* Don't remap oacc parallel reduction variables, because the
+	/* Don't remap compute constructs' reduction variables, because the
 	   intermediate result must be local to each gang.  */
 	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 			   && OMP_CLAUSE_MAP_IN_REDUCTION (c)))
diff --git a/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c b/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c
index acfbe7ff031a..31c4ee349f2c 100644
--- a/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c
@@ -1,5 +1,7 @@
-/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
-   num_workers, vector_length.  */
+/* Invalid use of OpenACC parallelism dimensions clauses: 'num_gangs',
+   'num_workers', 'vector_length'.  */
+
+/* See also '../../gfortran.dg/goacc/parallel-dims-2.f90'.  */
 
 void f(int i, float f)
 {
@@ -255,4 +257,14 @@ void f(int i, float f)
   vector_length(&f) /* { dg-error "'vector_length' expression must be integral" } */ \
   num_gangs( /* { dg-error "expected (primary-|)expression before end of line" "TODO" { xfail c } } */
   ;
+
+
+  /* The 'serial' construct doesn't allow these at all.  */
+
+#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid for '#pragma acc serial'" } */
+  ;
+#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not valid for '#pragma acc serial'" } */
+  ;
+#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not valid for '#pragma acc serial'" } */
+  ;
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/serial-dims.c b/gcc/testsuite/c-c++-common/goacc/serial-dims.c
deleted file mode 100644
index 41698d279c98..000000000000
--- a/gcc/testsuite/c-c++-common/goacc/serial-dims.c
+++ /dev/null
@@ -1,12 +0,0 @@
-/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
-   num_workers, vector_length with the serial construct.  */
-
-void f(void)
-{
-#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid for '#pragma acc serial'" } */
-  ;
-#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not valid for '#pragma acc serial'" } */
-  ;
-#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not valid for '#pragma acc serial'" } */
-  ;
-}
diff --git a/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 b/gcc/testsuite/gfortran.dg/goacc/parallel-dims-2.f90
similarity index 50%
rename from gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
rename to gcc/testsuite/gfortran.dg/goacc/parallel-dims-2.f90
index 72b4a8361776..91a5c300a94c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/parallel-dims-2.f90
@@ -1,34 +1,15 @@
-! Invalid use of OpenACC parallelism dimensions clauses: num_gangs,
-! num_workers, vector_length with the serial construct.
+! Invalid use of OpenACC parallelism dimensions clauses: 'num_gangs',
+! 'num_workers', 'vector_length'.
 
-subroutine s()
-  integer :: i
-  !$acc parallel
-  !$acc end parallel
+! See also '../../c-c++-common/goacc/parallel-dims-2.c'.
 
-  !$acc parallel loop
-  do i = 1, 5
-  end do
-
-  !$acc parallel loop
-  do i = 1, 5
-  end do
-  !$acc end parallel loop
-
-  !$acc serial loop
-  do i = 1, 5
-  end do
+subroutine f()
+  !TODO 'kernels', 'parallel' testing per '../../c-c++-common/goacc/parallel-dims-2.c'.
+  !TODO This should incorporate some of the testing done in 'sie.f95'.
 
-  !$acc serial loop
-  do i = 1, 5
-  end do
-  !$acc end serial loop
 
-  !$acc serial
-  !$acc end serial
-end subroutine s
+  ! The 'serial' construct doesn't allow these at all.
 
-subroutine f()
 !$acc serial num_gangs (1)  ! { dg-error "Failed to match clause at" }
 !$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
 
@@ -37,4 +18,5 @@ subroutine f()
 
 !$acc serial vector_length (1)  ! { dg-error "Failed to match clause at" }
 !$acc end serial  ! { dg-error "Unexpected !.ACC END SERIAL statement" }
+
 end subroutine f
diff --git a/gcc/tree.h b/gcc/tree.h
index a7d39c3a74df..4bec90d9a729 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1622,7 +1622,8 @@ class auto_suppress_location_wrappers
    treatment if OMP_CLAUSE_SIZE is zero.  */
 #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \
   TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
-/* Nonzero if this map clause is for an ACC parallel reduction variable.  */
+/* Nonzero if this map clause is for an OpenACC compute construct's reduction
+   variable.  */
 #define OMP_CLAUSE_MAP_IN_REDUCTION(NODE) \
   TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 7e699f476b21..a5edfc6ca164 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -1,6 +1,8 @@
 /* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
    vector_length.  */
 
+/* See also '../libgomp.oacc-fortran/parallel-dims.f90'.  */
+
 #include <limits.h>
 #include <openacc.h>
 #include <gomp-constants.h>
@@ -45,6 +47,8 @@ int main ()
 {
   acc_init (acc_device_default);
 
+  /* OpenACC parallel construct.  */
+
   /* Non-positive value.  */
 
   /* GR, WS, VS.  */
@@ -478,6 +482,8 @@ int main ()
   }
 
 
+  /* OpenACC kernels construct.  */
+
   /* We can't test parallelized OpenACC kernels constructs in this way: use of
      the acc_gang, acc_worker, acc_vector functions will make the construct
      unparallelizable.  */
@@ -544,5 +550,72 @@ int main ()
   }
 
 
+  /* OpenACC serial construct.  */
+
+  /* GR, WS, VS.  */
+  {
+    int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
+    gangs_min = workers_min = vectors_min = INT_MAX;
+    gangs_max = workers_max = vectors_max = INT_MIN;
+#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \
+  reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+    {
+      for (int i = 100; i > -100; i--)
+	{
+	  gangs_min = gangs_max = acc_gang ();
+	  workers_min = workers_max = acc_worker ();
+	  vectors_min = vectors_max = acc_vector ();
+	}
+    }
+    if (gangs_min != 0 || gangs_max != 1 - 1
+	|| workers_min != 0 || workers_max != 1 - 1
+	|| vectors_min != 0 || vectors_max != 1 - 1)
+      __builtin_abort ();
+  }
+
+  /* Composition of GP, WP, VP.  */
+  {
+    int vectors_actual = 1;  /* Implicit 'vector_length (1)' clause.  */
+    int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
+    gangs_min = workers_min = vectors_min = INT_MAX;
+    gangs_max = workers_max = vectors_max = INT_MIN;
+#pragma acc serial copy (vectors_actual) /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \
+  copy (gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max)
+    {
+      if (acc_on_device (acc_device_nvidia))
+	{
+	  /* The GCC nvptx back end enforces vector_length (32).  */
+	  /* It's unclear if that's actually permissible here;
+	     <https://github.com/OpenACC/openacc-spec/issues/238> "OpenACC
+	     'serial' construct might not actually be serial".  */
+	  vectors_actual = 32;
+	}
+#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+      for (int i = 100; i > -100; i--)
+#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+	for (int j = 100; j > -100; j--)
+#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+	  for (int k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
+	    {
+	      gangs_min = gangs_max = acc_gang ();
+	      workers_min = workers_max = acc_worker ();
+	      vectors_min = vectors_max = acc_vector ();
+	    }
+    }
+    if (acc_get_device_type () == acc_device_nvidia)
+      {
+	if (vectors_actual != 32)
+	  __builtin_abort ();
+      }
+    else
+      if (vectors_actual != 1)
+	__builtin_abort ();
+    if (gangs_min != 0 || gangs_max != 1 - 1
+	|| workers_min != 0 || workers_max != 1 - 1
+	|| vectors_min != 0 || vectors_max != vectors_actual - 1)
+      __builtin_abort ();
+  }
+
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
deleted file mode 100644
index bb91c9221f89..000000000000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/* OpenACC dimensions with the serial construct.  */
-
-#include <limits.h>
-#include <openacc.h>
-#include <gomp-constants.h>
-
-/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
-   not behaving as expected for -O0.  */
-#pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
-{
-  if (acc_on_device ((int) acc_device_host))
-    return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
-    return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
-  else
-    __builtin_abort ();
-}
-
-#pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
-{
-  if (acc_on_device ((int) acc_device_host))
-    return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
-    return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
-  else
-    __builtin_abort ();
-}
-
-#pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
-{
-  if (acc_on_device ((int) acc_device_host))
-    return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
-    return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
-  else
-    __builtin_abort ();
-}
-
-
-int main ()
-{
-  acc_init (acc_device_default);
-
-  /* Serial OpenACC constructs must get launched as 1 x 1 x 1.  */
-  {
-    int gangs_min, gangs_max;
-    int workers_min, workers_max;
-    int vectors_min, vectors_max;
-    int gangs_actual, workers_actual, vectors_actual;
-    int i, j, k;
-
-    gangs_min = workers_min = vectors_min = INT_MAX;
-    gangs_max = workers_max = vectors_max = INT_MIN;
-    gangs_actual = workers_actual = vectors_actual = 1;
-#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */
-    {
-      if (acc_on_device (acc_device_nvidia))
-	{
-	  /* The GCC nvptx back end enforces vector_length (32).  */
-	  vectors_actual = 32;
-	}
-      else if (!acc_on_device (acc_device_host))
-	__builtin_abort ();
-#pragma acc loop gang \
-  reduction (min: gangs_min, workers_min, vectors_min) \
-  reduction (max: gangs_max, workers_max, vectors_max)
-      for (i = 100 * gangs_actual; i > -100 * gangs_actual; i--)
-#pragma acc loop worker \
-  reduction (min: gangs_min, workers_min, vectors_min) \
-  reduction (max: gangs_max, workers_max, vectors_max)
-	for (j = 100 * workers_actual; j > -100 * workers_actual; j--)
-#pragma acc loop vector \
-  reduction (min: gangs_min, workers_min, vectors_min) \
-  reduction (max: gangs_max, workers_max, vectors_max)
-	  for (k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
-	    {
-	      gangs_min = gangs_max = acc_gang ();
-	      workers_min = workers_max = acc_worker ();
-	      vectors_min = vectors_max = acc_vector ();
-	    }
-      if (gangs_min != 0 || gangs_max != gangs_actual - 1
-	  || workers_min != 0 || workers_max != workers_actual - 1
-	  || vectors_min != 0 || vectors_max != vectors_actual - 1)
-	__builtin_abort ();
-    }
-  }
-
-  return 0;
-}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c
similarity index 67%
rename from libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
rename to libgomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c
index 45c260510c29..b5986f4afef7 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c
@@ -1,5 +1,9 @@
-/* OpenACC dimensions with the serial construct.  */
-/* Used by serial-dims.f90.  */
+/* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
+   vector_length.  */
+
+/* Copied from '../libgomp.oacc-c-c++-common/parallel-dims.c'.  */
+
+/* Used by 'parallel-dims.f90'.  */
 
 #include <limits.h>
 #include <openacc.h>
@@ -8,7 +12,7 @@
 /* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
    not behaving as expected for -O0.  */
 #pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
+/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
@@ -19,7 +23,7 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
 }
 
 #pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
+/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
@@ -30,7 +34,7 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
 }
 
 #pragma acc routine seq
-static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
+/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
new file mode 100644
index 000000000000..1bfcd6ce0998
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90
@@ -0,0 +1,120 @@
+! OpenACC parallelism dimensions clauses: num_gangs, num_workers,
+! vector_length.
+
+! { dg-additional-sources parallel-dims-aux.c }
+! { dg-do run }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
+
+! See also '../libgomp.oacc-c-c++-common/parallel-dims.c'.
+
+module acc_routines
+  implicit none (type, external)
+
+  interface
+    integer function acc_gang() bind(C)
+      !$acc routine seq
+    end function acc_gang
+
+    integer function acc_worker() bind(C)
+      !$acc routine seq
+    end function acc_worker
+
+    integer function acc_vector() bind(C)
+      !$acc routine seq
+    end function acc_vector
+  end interface
+end module acc_routines
+
+program main
+  use iso_c_binding
+  use openacc
+  use acc_routines
+  implicit none (type, external)
+
+  integer :: gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max
+  integer :: vectors_actual
+  integer :: i, j, k
+
+  call acc_init (acc_device_default)
+
+  ! OpenACC parallel construct.
+
+  !TODO
+
+
+  ! OpenACC kernels construct.
+
+  !TODO
+
+
+  ! OpenACC serial construct.
+
+  ! GR, WS, VS.
+
+  gangs_min = huge(gangs_min) ! INT_MAX
+  workers_min = huge(workers_min) ! INT_MAX
+  vectors_min = huge(vectors_min) ! INT_MAX
+  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
+  workers_max = -huge(gangs_max) - 1 ! INT_MIN
+  vectors_max = -huge(gangs_max) - 1 ! INT_MIN
+  !$acc serial &
+  !$acc   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max) ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } }
+  do i = 100, -99, -1
+     gangs_min = acc_gang ();
+     gangs_max = acc_gang ();
+     workers_min = acc_worker ();
+     workers_max = acc_worker ();
+     vectors_min = acc_vector ();
+     vectors_max = acc_vector ();
+  end do
+  !$acc end serial
+  if (gangs_min /= 0 .or. gangs_max /= 1 - 1 &
+      .or. workers_min /= 0 .or. workers_max /= 1 - 1 &
+      .or. vectors_min /= 0 .or. vectors_max /= 1 - 1) &
+    stop 1
+
+  ! Composition of GP, WP, VP.
+
+  vectors_actual = 1 ! Implicit 'vector_length (1)' clause.
+  gangs_min = huge(gangs_min) ! INT_MAX
+  workers_min = huge(workers_min) ! INT_MAX
+  vectors_min = huge(vectors_min) ! INT_MAX
+  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
+  workers_max = -huge(gangs_max) - 1 ! INT_MIN
+  vectors_max = -huge(gangs_max) - 1 ! INT_MIN
+  !$acc serial copy (vectors_actual) &
+  !$acc   copy (gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max) ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } }
+  if (acc_on_device (acc_device_nvidia)) then
+     ! The GCC nvptx back end enforces vector_length (32).
+     ! It's unclear if that's actually permissible here;
+     ! <https://github.com/OpenACC/openacc-spec/issues/238> "OpenACC 'serial'
+     ! construct might not actually be serial".
+   vectors_actual = 32
+  end if
+  !$acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+  do i = 100, -99, -1
+     !$acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+     do j = 100, -99, -1
+        !$acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+        do k = 100 * vectors_actual, -99 * vectors_actual, -1
+           gangs_min = acc_gang ();
+           gangs_max = acc_gang ();
+           workers_min = acc_worker ();
+           workers_max = acc_worker ();
+           vectors_min = acc_vector ();
+           vectors_max = acc_vector ();
+        end do
+     end do
+  end do
+  !$acc end serial
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (vectors_actual /= 32) stop 2
+  else
+     if (vectors_actual /= 1) stop 3
+  end if
+  if (gangs_min /= 0 .or. gangs_max /= 1 - 1 &
+      .or. workers_min /= 0 .or. workers_max /= 1 - 1 &
+      .or. vectors_min /= 0 .or. vectors_max /= vectors_actual - 1) &
+    stop 4
+
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
deleted file mode 100644
index 25c933629045..000000000000
--- a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90
+++ /dev/null
@@ -1,89 +0,0 @@
-! OpenACC dimensions with the serial construct.
-
-! { dg-additional-sources serial-dims-aux.c }
-! { dg-warning "command line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
-
-module acc_routines
-  implicit none (type, external)
-
-  interface
-    integer function acc_gang() bind(C)
-      !$acc routine seq
-    end function acc_gang
-
-    integer function acc_worker() bind(C)
-      !$acc routine seq
-    end function acc_worker
-
-    integer function acc_vector() bind(C)
-      !$acc routine seq
-    end function acc_vector
-  end interface
-end module acc_routines
-
-program main
-  use iso_c_binding
-  use openacc
-  use acc_routines
-  implicit none (type, external)
-
-  integer :: gangs_min, gangs_max
-  integer :: workers_min, workers_max
-  integer :: vectors_min, vectors_max
-  integer :: gangs_actual, workers_actual, vectors_actual
-  integer :: i, j, k
-
-  call acc_init (acc_device_default)
-
-  ! Serial OpenACC constructs must get launched as 1 x 1 x 1.
-  gangs_min = huge(gangs_min)
-  workers_min = huge(workers_min)
-  vectors_min = huge(vectors_min)
-  gangs_max = -huge(gangs_max) - 1  ! INT_MIN
-  workers_max = -huge(gangs_max) - 1
-  vectors_max = -huge(gangs_max) - 1
-  gangs_actual = 1
-  workers_actual = 1
-  vectors_actual = 1
-
-  !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } }
-    if (acc_on_device (acc_device_nvidia)) then
-      ! The GCC nvptx back end enforces vector_length (32).
-      vectors_actual = 32
-    elseif (acc_on_device (acc_device_gcn)) then
-      ! AMD GCN relies on the autovectorizer for the vector dimension:
-      ! the loop below isn't likely to be vectorized, so vectors_actual
-      ! is effectively 1.
-      vectors_actual = 1
-    elseif (.not. acc_on_device (acc_device_host)) then
-      stop 1
-    end if
-
-!$acc loop gang &
-!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
-!$acc & reduction (max: gangs_max, workers_max, vectors_max)
-    do i = 100 * gangs_actual, -99 * gangs_actual, -1
-!$acc loop worker &
-!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
-!$acc & reduction (max: gangs_max, workers_max, vectors_max)
-      do j = 100 * workers_actual, -99 * workers_actual, -1
-!$acc loop vector &
-!$acc & reduction (min: gangs_min, workers_min, vectors_min) &
-!$acc & reduction (max: gangs_max, workers_max, vectors_max)
-        do k = 100 * vectors_actual, -99 * vectors_actual, -1
-          gangs_min = acc_gang ();
-          gangs_max = acc_gang ();
-          workers_min = acc_worker ();
-          workers_max = acc_worker ();
-          vectors_min = acc_vector ();
-          vectors_max = acc_vector ();
-       end do
-     end do
-   end do
-  if (gangs_min /= 0 .or. gangs_max /= gangs_actual - 1 &
-      .or. workers_min /= 0 .or. workers_max /= workers_actual - 1 &
-      .or. vectors_min /= 0 .or. vectors_max /= vectors_actual - 1) &
-    stop 2
-!$acc end serial
-
-end program main
-- 
2.17.1

Attachment: signature.asc
Description: PGP signature

Reply via email to