Re: Help with fortran pointer ans OpenACC

2023-08-24 Thread Patrick Begou via Fortran

Hi Tobias,

many thanks for all these informations!
- I confirm, this clause solve the problem with GNU Fortran
- This workaround is fully compatible with Nvidia Fortran
- The problem with Cray Fortran remains (with this compiler, the 
test-case runs but the result is wrong)


I've tested the suggested options for debugging but I must admit that 
these intermediates files are a little bit out of my knowledge level to 
easily identify the problem.
In my work, portability is important and I have access to different 
compilers and GPU architectures and may be my best contribution to GNU 
software is limited to these tests and reports with small test-cases 
from a large scientific code that is expected to run on most GPU with 
various compilers.


Thanks again for your help and detailed explanation

Patrick

Le 23/08/2023 à 13:41, Tobias Burnus a écrit :

Hi,

On 23.08.23 12:19, Patrick Begou via Fortran wrote:

For several days I have some trouble with OpenACC offloading and
fortran pointers. I'm testing with a very small peace of code to
investigate but I do not progress for several days and I need your help.

Could someone give me advices or a small explanation on what I have
not understood there ?


First, for debugging, using -fdump-tree-original -fdump-tree-gimple
-fdump-tree-omplower and looking at the file .* might
help - grep for '#pragma'; this shows the internal representation
(incomplete) in a C like output. That's how I spotted the issue below.

* * *

For gfortran, the issue is:

    !$acc parallel loop collapse(2) default(present) if(runongpu)

It works if you add a "present(current)" → see also newly filed
https://gcc.gnu.org/PR16

gfortran seems to follow - but not completely - the spec:

Quoting from OpenACC 3.2: "2.6.2 Variables with Implicitly Determined
Data Attributes":

"A scalar variable will be treated as if it appears either:

* In a copy clause if the compute construct is a kernels construct.
* In a firstprivate clause otherwise.

Note: Any default(none) clause visible at the compute construct applies
to both aggregate and scalar variables. However, any default(present)
clause visible at the compute construct applies only to aggregate
variables."

However, the glossary defines:

"Aggregate variables – a variable of any non-scalar datatype, including
array or composite variables.
In Fortran, this includes any variable with allocatable or pointer
attribute and character variables."

And, as you have a (scalar) Fortran pointer, the following should have
applied instead:

"An aggregate variable will be treated as if it appears either:
• In a present clause if there is a default(present) clause visible at
the compute construct.
• In a copy clause otherwise."

* * *

Thus: It looks as if your program is valid, adding 'present(current)'
will fix it for gfortran (and Cray?), 'default(...)' and implicit
mapping is confusing, and there is now a gfortran PR to track this
issue: https://gcc.gnu.org/PR16

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 
201, 80634 München; Gesellschaft mit beschränkter Haftung; 
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: 
München; Registergericht München, HRB 106955





Re: Help with fortran pointer ans OpenACC

2023-08-23 Thread Tobias Burnus

Hi,

On 23.08.23 12:19, Patrick Begou via Fortran wrote:

For several days I have some trouble with OpenACC offloading and
fortran pointers. I'm testing with a very small peace of code to
investigate but I do not progress for several days and I need your help.

Could someone give me advices or a small explanation on what I have
not understood there ?


First, for debugging, using -fdump-tree-original -fdump-tree-gimple
-fdump-tree-omplower and looking at the file .* might
help - grep for '#pragma'; this shows the internal representation
(incomplete) in a C like output. That's how I spotted the issue below.

* * *

For gfortran, the issue is:

!$acc parallel loop collapse(2) default(present) if(runongpu)

It works if you add a "present(current)" → see also newly filed
https://gcc.gnu.org/PR16

gfortran seems to follow - but not completely - the spec:

Quoting from OpenACC 3.2: "2.6.2 Variables with Implicitly Determined
Data Attributes":

"A scalar variable will be treated as if it appears either:

* In a copy clause if the compute construct is a kernels construct.
* In a firstprivate clause otherwise.

Note: Any default(none) clause visible at the compute construct applies
to both aggregate and scalar variables. However, any default(present)
clause visible at the compute construct applies only to aggregate
variables."

However, the glossary defines:

"Aggregate variables – a variable of any non-scalar datatype, including
array or composite variables.
In Fortran, this includes any variable with allocatable or pointer
attribute and character variables."

And, as you have a (scalar) Fortran pointer, the following should have
applied instead:

"An aggregate variable will be treated as if it appears either:
• In a present clause if there is a default(present) clause visible at
the compute construct.
• In a copy clause otherwise."

* * *

Thus: It looks as if your program is valid, adding 'present(current)'
will fix it for gfortran (and Cray?), 'default(...)' and implicit
mapping is confusing, and there is now a gfortran PR to track this
issue: https://gcc.gnu.org/PR16

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Help with fortran pointer ans OpenACC

2023-08-23 Thread Patrick Begou via Fortran

Hi everyone!

For several days I have some trouble with OpenACC offloading and fortran 
pointers. I'm testing with a very small peace of code to investigate but 
I do not progress for several days and I need your help.


The attached code goal is just to initialize some data on the GPU and is 
representative of my problem on a very large code.


   - It works fine with nvfortran (22.11)

   - it do not work with Gnu fortran (14.0.0 20230822 - experimental) 
   => invalid memory

   - it do not work with  Cray Fortran (15.1) => wrong results

so I think the problem is the code, not the compiler.

It is also difficult to find some openACC offloading examples using 
Fortran pointers and I'm stuck with this problem.


Could someone give me advices or a small explanation on what I have not 
understood there ?


Thanks for your help

Patrick


Code details:

- all my fortran modules are grouped in the same file for simplification 
of the provided test-case.


- compilation with GNU Firtran is: "gfortran -cpp -g -fopenacc grouped.f90"

- setting  "runongpu=.false." line 7 (no GPU) the result is:

 Default init OK
 Default value OK

- setting  "runongpu=.true." line 7 (no GPU) the result is:

Default init OK
libgomp: cuStreamSynchronize error: an illegal memory access was encountered

- with nvhpc/22.11 and "runongpu=.true.", built with  "nvfortran 
-acc=gpu,noautopar  -gpu=cc80 -Minfo=accel grouped.f90"


Default init OK
Default value OK
!=
! Just to say run on the device or not.
!=
module openacc_defs

  implicit none
  logical, save :: runongpu=.true.
end module openacc_defs



!=
! Data structure for r2_tab and r2_ptr to manage pointers.
! r2_ptr is used in an allocatable array for a dynamivc number of r2_tab variables
! but could be used later in chained lists
! Memory is allocated on GPU each time.
!=

module tab_m

  implicit none

 type r2_tab
 double precision, dimension(:,:), allocatable :: val
 integer :: dim1
 integer :: dim2
 end type r2_tab

 type r2_ptr
 type(r2_tab), pointer :: ptr
 type(r2_ptr), pointer :: next
 end type r2_ptr


contains

subroutine  new_r2_tab(tab,n,m)
implicit none
integer, intent(in) ::n,m
type(r2_tab), pointer, intent(inout) ::tab
!

   if (.not. associated(tab)) allocate(tab)
   if (allocated(tab%val)) deallocate(tab%val)

   allocate (tab%val(n,m))
   tab%dim1=n
   tab%dim2=m
   tab%val(:,:)=1.0D0 
   !$acc enter data create(tab)
   !$acc enter data create(tab%val)
end subroutine new_r2_tab

end module tab_m

!=
! This module implements data processing (just an initialization here)
! if runongpu is .true. initialization is run on the GPU and then host is updated.
!=
module manage_data
   use openacc_defs
   use tab_m
   implicit none

   contains

   subroutine set_default_val_gpu(liste, defval, nitems)
 implicit none
 integer, intent(in) :: nitems
 double precision, intent(in) :: defval
 type(r2_ptr), dimension(nitems) :: liste

 type(r2_tab), pointer :: current=>null()
 integer:: item,j,k

 do item=1, nitems
current=>liste(item)%ptr
!print*,current%dim1, current%dim2,size(current%val)

!$acc parallel loop collapse(2) default(present) if(runongpu)
do k=1, current%dim2
   do j=1, current%dim1
  current%val(j,k)=defval
   end do
end do
!$acc update if(runongpu) host(current%val)
 end do
   end subroutine set_default_val_gpu

end module manage_data


!=
! main program.
!
!=

program main
  use tab_m
  use manage_data
  implicit none

  integer, parameter:: N=5
  type(r2_ptr), dimension(N) :: liste
  integer:: i,j,k
  type(r2_tab), pointer :: current=>null()
  double precision :: total

 ! Initialize 
 do i=1,N
 nullify(liste(i)%ptr)
 nullify(liste(i)%next)
 end do

 ! Allocate (do not manage "next" pointer, all elements are set to 1.0)
 do i=1,N
call new_r2_tab(liste(i)%ptr,N,i*N)
 end do

 ! Check all is correct on host side
 do i=1,N
if (sum(liste(i)%ptr%val) .NE. N*i*N) then
   write(6,*)"Something goes wrong",sum(liste(i)%ptr%val)," != ",N*i*N
   STOP (1)
end if
 end do
 write(6,*) "Default init OK"

 ! Update on host (runongpu is false)
 call