Hi,

First, as you mentioned elsewhere, array descriptors are mishandled
(allocator/alignment applies to descriptor not the data).
I have now filed https://gcc.gnu.org/PR123574 for it.

[The issue is semi-independent of this patch as it also
affects data-sharing clauses.]

* * *


Kwok Cheung Yeung wrote:
block
integer :: i(s)
!$omp target firstprivate(i) allocate(allocator(omp_low_lat_mem_alloc) : i)
...
This type of VLA is expanded to Gimple differently from C-style VLAs.
...
while in Fortran:

integer :: x(n)
x(1) = 1

becomes

integer(kind=4)[0:D.4715] * restrict x;
void * restrict D.4718;

_7 = <size calculated from n>;
D.4718 = __builtin_malloc (_7);
x = D.4718;
(*x)[1] = 1;

For local arrays (VLA or size is a constant), actually a lot can happen:

(A) If the size is a constant – and the procedure is 'non_recursive' (in
    the past the default in Fortran (and gfortran) unless -fopenmp/-fopenacc),
    then it might be created in static memory to avoid stack-size issue
    → -fmax-stack-var-size
    (This also happens with (explicit/implicit) 'SAVE'd arrays, e.g. those
    in the main program.)

(B) Stack arrays are used - but as Fortran programs tend to use a lot of
    those, users tend to use 'ulimit -s unlimited'.
→ gfortran added -f(no-)stack-arrays and also has …

(C) Otherwise, if either exceeding -fmax-stack-var-size or
    if a VLA (and -fno-stack-arrays), a temporary is allocated.

I have to admit that I kind of missed option (C) - even though I
created a program for it. (BTW: -Ofast implies -fstack-arrays.)


For (B), the compiler generates:
    integer(kind=4)[0:D.4693] * restrict a;
    integer(kind=4) A.2[0:D.4693] [value-expr: *A.6];
    A.6 = __builtin_alloca_with_align (D.4699, 32);
    a = A.6;

* * *

Glancing at the code that handles 'allocate' with data sharing clauses,
I see that the following C++ code gives:

 #pragma omp parallel firstprivate(aref) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : aref)

→ aref.2 = __builtin_GOMP_alloc (128, 40, 5);

 #pragma omp target firstprivate(aref) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : aref)

→ aref.7 = __builtin_GOMP_alloc (128, 8, 5);

The first one (for parallel, old code) uses the size of the array,
the second one (for target, this patch) uses the size of the pointer ...

I think the first one is the expected result.

[This is a C++ program using a reference type]
------------
#include <omp.h>

void f() {
 int A[10];
 auto &aref = A;

 #pragma omp parallel firstprivate(aref) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : aref)
   aref[0] = 1;

 #pragma omp target firstprivate(aref) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : aref)
   aref[0] = 1;
}
-------------

If I look at lower_private_allocate for the C++ program, it is called with:
is_ref=true

The call is in a code block protected by:

          else if (omp_privatize_by_reference (var)

And I bet that you should also call that function instead
of using (+ defining) 'is_fortran_variable_sized'.

Example Fortran program:
--------------------
subroutine x
  use omp_lib
  implicit none
  integer :: n = 5
  block
    integer :: A(n)
    !$omp parallel firstprivate(A) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : A)
      A(1) = 1;
    !$omp end parallel

    !$omp target firstprivate(A) allocate(align(128), 
allocator(omp_low_lat_mem_alloc) : A)
      A(1) = 1;
    !$omp end target
  end block
end
--------------------

For C++, the hook (→ cxx_omp_privatize_by_reference)
is true for:
  return (TYPE_REF_P (TREE_TYPE (decl))
          || is_invisiref_parm (decl));

where the latter is
  return ((TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
          && DECL_BY_REFERENCE (t));


And for Fortran, it is true in a bunch of cases. The hook also
handles, e.g., scalar allocatables or PARM_DECL (reference type or
also with 'optional' attribute).

→ gcc/fortran/trans-openmp.cc:

/* True if OpenMP should privatize what this DECL points to rather
   than the DECL itself.  */
bool
gfc_omp_privatize_by_reference (const_tree decl)
{
...


* * *

I have made no attempt to optimise for local or USM in this version of the patch.

It is not completely clear what you mean by 'local'.

* * *

Tobias

PS: I still only glanced at the code – but didn't spot anything else. I will come back to this patch a bit later, but first I should review another patch …

Reply via email to