Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-18 Thread Thomas Schwinge
Hi Chung-Lin!

Thanks for your work here, which I'm beginning to look into (prerequisite
"[PATCH, OpenACC 2.7] Implement reductions for arrays and structs",
first, of course); it'll take me some time.


In non-offloading testing, I noticed for x86_64-pc-linux-gnu '-m32':

+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  execution test

With optimizations enabled, it runs into 'STOP 4'.

Per '-Wextra':

[...]/libgomp.oacc-fortran/reduction-13.f90:40:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:63:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:64:6: Warning: Inequality 
comparison for REAL(8) at (1) [-Wcompare-reals]

Do we need to allow for some epsilon (generally in such test cases), or
is there another problem?

For reference:

On 2024-02-08T22:47:13+0800, Chung-Lin Tang  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
> @@ -0,0 +1,66 @@
> +! { dg-do run }
> +
> +! record type reductions
> +
> +program reduction_13
> +  implicit none
> +
> +  type t1
> + integer :: i
> + real :: r
> +  end type t1
> +
> +  type t2
> + real :: r
> + integer :: i
> + double precision :: d
> +  end type t2
> +
> +  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
> +  integer :: i
> +  type(t1) :: v1, a1
> +  type (t2) :: v2, a2
> +
> +  v1%i = 0
> +  v1%r = 0
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
> +  !$acc loop reduction (+:v1)
> +  do i = 1, n
> + v1%i = v1%i + 1
> + v1%r = v1%r + 2
> +  end do
> +  !$acc end parallel
> +  a1%i = 0
> +  a1%r = 0
> +  do i = 1, n
> + a1%i = a1%i + 1
> + a1%r = a1%r + 2
> +  end do
> +  if (v1%i .ne. a1%i) STOP 1
> +  if (v1%r .ne. a1%r) STOP 2
> +
> +  v2%i = 1
> +  v2%r = 1
> +  v2%d = 1
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
> +  !$acc loop reduction (*:v2)
> +  do i = 1, n
> + v2%i = v2%i * 2
> + v2%r = v2%r * 1.1
> + v2%d = v2%d * 1.3
> +  end do
> +  !$acc end parallel
> +  a2%i = 1
> +  a2%r = 1
> +  a2%d = 1
> +  do i = 1, n
> + a2%i = a2%i * 2
> + a2%r = a2%r * 1.1
> + a2%d = a2%d * 1.3
> +  end do
> +
> +  if (v2%i .ne. a2%i) STOP 3
> +  if (v2%r .ne. a2%r) STOP 4
> +  if (v2%d .ne. a2%d) STOP 5
> +
> +end program reduction_13


Grüße
 Thomas


Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-13 Thread Tobias Burnus

Hi Chung-Lin, hi Thomas, hello world,

some thoughts glancing at the patch.

Chung-Lin Tang wrote:

There is still some shortcomings in the current state, mainly that only explicit-shaped 
arrays can be used (like its C counterpart). Anything else is currently a bit more 
complicated in the middle-end, since the existing reduction code creates an 
"init-op" (literal of initial values) which can't be done when say 
TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the 
hook to solve this later, but I think the current state is okay to submit.


I think having some initial support is fine, but it needs an 
understandable and somewhat complete error diagnostic and testcases. 
More to this below.



+  if (!TREE_CONSTANT (min_tree) || !TREE_CONSTANT (max_tree))
+   {
+ error_at (loc, "array in reduction must be of constant size");
+ return error_mark_node;
+   }

Shouldn't this use a sorry_at instead?


+ /* OpenACC current only supports array reductions on explicit-shape
+arrays.  */
+ if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+ || n->sym->attr.codimension)
gfc_error ("Array %qs is not permitted in reduction at %L",
   n->sym->name, >where);
[Coarray excursion. I am in favor of allowing it for the reasons above, 
but it could be also rejected but I would prefer to have a proper error 
message in that case.]


While coarrays are unspecified, I do not see a reason why a corray 
shouldn't be permitted here – as long as it is not coindexed. At the 
end, it is just a normal array with some additional properties, which 
make it possible to remotely access it.


Note: For coarray scalars, we have 'sym->as', thus the check should be 
'(n->sym->as && n->sym->as->rank)' to permit scalar coarrays.


* * *

Coarray excursion: A coarray variables exists in multiple processes 
("images", e.g. MPI processes). If 'caf' and 'caf2' are coarrays, then 
'caf = 5' and 'i = caf2' refer to the local variable.


On the other hand, 'caf[n] = 5' or 'i = caf[3,m]' refers to the 'caf' 
variable on image 'n' or [3,m]', respectively, which implies in general 
some function call to read or set the remote data, unless the memory is 
directly accessible (→ e.g. some offset calculation) and the compiler 
already knows how to handle this.


While a coarrary might be allocated in some special memory, as long as 
one uses the local version (i.e. not coindexed / without the image index 
in brackets).


Assume for the example above, e.g., integer :: caf[*], caf2[3:6, 7:*].

* * *

Thus, in terms of OpenACC or OpenMP, there is no reason to fret a 
coarray as long as it is not coindexed and as long as OpenMP/OpenACC 
does not interfere with the memory allocation – either directly ('!$omp 
allocators') or indirectly by placing it into special memory (pinned, 
pseudo-unified-shared memory → OG13's -foffload-memory=pinned/unified).


In the meanwhile, OpenMP actually explicitly allows coarrays with few 
exceptions while OpenACC talks about unspecified behavior.


* * *

Back to generic comments:

If I look at the existing code, I see at gfc_match_omp_clause_reduction:


 if (gfc_match_omp_variable_list (" :", >lists[list_idx], false, NULL,
  , openacc, allow_derived) != 
MATCH_YES)


If 'openacc' is true, array sections are permitted - but the code added 
(see quote above) does not handle n->expr at all and only n->sym.


I think there needs to be at least a "gfc_error ("Sorry, subarrays/array 
sections not yet handled" [subarray is the OpenACC wording, 'array 
section' is the Fortran one, which might be clearer.


But you could consider to handle at least array elements, i.e. 
n->expr->rank == 0.


Additionally, I think the current error message is completely unhelpful 
given that some arrays are supported but most are not.


I think there should be also some testcases for the not-yet-supported 
case. I think the following will trigger the omp-low.cc 'sorry_at' (or 
currently 'error' - but I think it should be a sorry):


subroutine foo(n)

integer :: n, A(n)

... reduction(+:A)

And most others will trigger in openmp.cc; for those, you should have an 
allocatable/pointer and assumed-shape arrays for the diagnostic testcase 
as well.


* * *

I have not really experimented with the code, but does it handle 
multi-dimensional constant arrays like 'integer :: a(3:6,10,-1:1)' ? — I 
bet it does, at least after handling my example [2] for the C patch [1].


Thanks,

Tobias

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647704.html



[PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-02-08 Thread Chung-Lin Tang
Hi Tobias, Thomas,
this patch adds support for Fortran to use arrays and struct(record) types in 
OpenACC reductions.

There is still some shortcomings in the current state, mainly that only 
explicit-shaped arrays can be used (like its C counterpart). Anything else is 
currently a bit more complicated in the middle-end, since the existing 
reduction code creates an "init-op" (literal of initial values) which can't be 
done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. 
I think we'll be on the hook to solve this later, but I think the current state 
is okay to submit.

Tested without regressions on mainline (on top of first struct/array reduction 
patch[1])

Thanks,
Chung-Lin

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

2024-02-08  Chung-Lin Tang  

gcc/fortran/ChangeLog:
* openmp.cc (oacc_reduction_defined_type_p): New function.
(resolve_omp_clauses): Adjust OpenACC array reduction error case. Use
oacc_reduction_defined_type_p for OpenACC.
* trans-openmp.cc (gfc_trans_omp_array_reduction_or_udr):
Add 'bool openacc' parameter, adjust part of function to be !openacc
only.
(gfc_trans_omp_reduction_list): Add 'bool openacc' parameter, pass to
calls to gfc_trans_omp_array_reduction_or_udr.
(gfc_trans_omp_clauses): Add 'openacc' argument to calls to
gfc_trans_omp_reduction_list.
(gfc_trans_omp_do): Pass 'op == EXEC_OACC_LOOP' as 'bool openacc'
parameter in call to gfc_trans_omp_clauses.

gcc/ChangeLog:
* omp-low.cc (omp_reduction_init_op): Add checking if reduced array
has constant bounds.
(lower_oacc_reductions): Add handling of error_mark_node.

gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/array-reduction.f90: Adjust testcase.
* gfortran.dg/goacc/reduction.f95: Likewise.

libgomp/ChangeLog:
* libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90: New testcase.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90: Likewise.
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..4bba9e666d6 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7047,6 +7047,72 @@ oacc_is_loop (gfc_code *code)
 || code->op == EXEC_OACC_LOOP;
 }
 
+static bool
+oacc_reduction_defined_type_p (enum gfc_omp_reduction_op rop, gfc_typespec *ts)
+{
+  if (rop == OMP_REDUCTION_USER || rop == OMP_REDUCTION_NONE)
+return false;
+
+  if (ts->type == BT_INTEGER)
+switch (rop)
+  {
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return false;
+  default:
+   return true;
+  }
+
+  if (ts->type == BT_LOGICAL)
+switch (rop)
+  {
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return true;
+  default:
+   return false;
+  }
+
+  if (ts->type == BT_REAL || ts->type == BT_COMPLEX)
+switch (rop)
+  {
+  case OMP_REDUCTION_PLUS:
+  case OMP_REDUCTION_TIMES:
+  case OMP_REDUCTION_MINUS:
+   return true;
+
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return false;
+
+  case OMP_REDUCTION_MAX:
+  case OMP_REDUCTION_MIN:
+   return ts->type != BT_COMPLEX;
+  case OMP_REDUCTION_IAND:
+  case OMP_REDUCTION_IOR:
+  case OMP_REDUCTION_IEOR:
+   return false;
+  default:
+   gcc_unreachable ();
+  }
+
+  if (ts->type == BT_DERIVED)
+{
+  for (gfc_component *p = ts->u.derived->components; p; p = p->next)
+   if (!oacc_reduction_defined_type_p (rop, >ts))
+ return false;
+  return true;
+}
+
+  return false;
+}
+
 static void
 resolve_scalar_int_expr (gfc_expr *expr, const char *clause)
 {
@@ -8137,13 +8203,15 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
  else
n->sym->mark = 1;
 
- /* OpenACC does not support reductions on arrays.  */
- if (n->sym->as)
+ /* OpenACC current only supports array reductions on explicit-shape
+arrays.  */
+ if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+ || n->sym->attr.codimension)
gfc_error ("Array %qs is not permitted in reduction at %L",
   n->sym->name, >where);
}
 }
-  
+
   for (n = omp_clauses->lists[OMP_LIST_TO]; n; n = n->next)
 n->sym->mark = 0;
   for (n = omp_clauses->lists[OMP_LIST_FROM]; n; n = n->next)
@@ -8797,39 +8865,46 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,