Hi,
First, as you mentioned elsewhere, array descriptors are mishandled
(allocator/alignment applies to descriptor not the data).
I have now filed https://gcc.gnu.org/PR123574 for it.
[The issue is semi-independent of this patch as it also
affects data-sharing clauses.]
* * *
Kwok Cheung Yeung wrote:
block
integer :: i(s)
!$omp target firstprivate(i)
allocate(allocator(omp_low_lat_mem_alloc) : i)
...
This type of VLA is expanded to Gimple differently from C-style VLAs.
...
while in Fortran:
integer :: x(n)
x(1) = 1
becomes
integer(kind=4)[0:D.4715] * restrict x;
void * restrict D.4718;
_7 = <size calculated from n>;
D.4718 = __builtin_malloc (_7);
x = D.4718;
(*x)[1] = 1;
For local arrays (VLA or size is a constant), actually a lot can happen:
(A) If the size is a constant – and the procedure is 'non_recursive' (in
the past the default in Fortran (and gfortran) unless -fopenmp/-fopenacc),
then it might be created in static memory to avoid stack-size issue
→ -fmax-stack-var-size
(This also happens with (explicit/implicit) 'SAVE'd arrays, e.g. those
in the main program.)
(B) Stack arrays are used - but as Fortran programs tend to use a lot of
those, users tend to use 'ulimit -s unlimited'.
→ gfortran added -f(no-)stack-arrays and also has …
(C) Otherwise, if either exceeding -fmax-stack-var-size or
if a VLA (and -fno-stack-arrays), a temporary is allocated.
I have to admit that I kind of missed option (C) - even though I
created a program for it. (BTW: -Ofast implies -fstack-arrays.)
For (B), the compiler generates:
integer(kind=4)[0:D.4693] * restrict a;
integer(kind=4) A.2[0:D.4693] [value-expr: *A.6];
A.6 = __builtin_alloca_with_align (D.4699, 32);
a = A.6;
* * *
Glancing at the code that handles 'allocate' with data sharing clauses,
I see that the following C++ code gives:
#pragma omp parallel firstprivate(aref) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : aref)
→ aref.2 = __builtin_GOMP_alloc (128, 40, 5);
#pragma omp target firstprivate(aref) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : aref)
→ aref.7 = __builtin_GOMP_alloc (128, 8, 5);
The first one (for parallel, old code) uses the size of the array,
the second one (for target, this patch) uses the size of the pointer ...
I think the first one is the expected result.
[This is a C++ program using a reference type]
------------
#include <omp.h>
void f() {
int A[10];
auto &aref = A;
#pragma omp parallel firstprivate(aref) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : aref)
aref[0] = 1;
#pragma omp target firstprivate(aref) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : aref)
aref[0] = 1;
}
-------------
If I look at lower_private_allocate for the C++ program, it is called with:
is_ref=true
The call is in a code block protected by:
else if (omp_privatize_by_reference (var)
And I bet that you should also call that function instead
of using (+ defining) 'is_fortran_variable_sized'.
Example Fortran program:
--------------------
subroutine x
use omp_lib
implicit none
integer :: n = 5
block
integer :: A(n)
!$omp parallel firstprivate(A) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : A)
A(1) = 1;
!$omp end parallel
!$omp target firstprivate(A) allocate(align(128),
allocator(omp_low_lat_mem_alloc) : A)
A(1) = 1;
!$omp end target
end block
end
--------------------
For C++, the hook (→ cxx_omp_privatize_by_reference)
is true for:
return (TYPE_REF_P (TREE_TYPE (decl))
|| is_invisiref_parm (decl));
where the latter is
return ((TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
&& DECL_BY_REFERENCE (t));
And for Fortran, it is true in a bunch of cases. The hook also
handles, e.g., scalar allocatables or PARM_DECL (reference type or
also with 'optional' attribute).
→ gcc/fortran/trans-openmp.cc:
/* True if OpenMP should privatize what this DECL points to rather
than the DECL itself. */
bool
gfc_omp_privatize_by_reference (const_tree decl)
{
...
* * *
I have made no attempt to optimise for local or USM in this version of
the patch.
It is not completely clear what you mean by 'local'.
* * *
Tobias
PS: I still only glanced at the code – but didn't spot anything else. I
will come back to this patch a bit later, but first I should review
another patch …