[Bug libgomp/95150] Some offloaded programs crash with openmp

2021-07-28 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #10 from Chinoune  ---
No one has the intention to fix it.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-12-19 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|WONTFIX |---

--- Comment #9 from Chinoune  ---
I get it with more examples.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-12-12 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #8 from Chinoune  ---
Adding "parallel do" to openmp directive solves the problem.
The crash reappears with "collapse(2)" with both OpenMP and OpenACC.

program main
  implicit none
  integer, parameter :: sp = selected_real_kind(6,37)
  real(sp), allocatable :: a(:,:), b(:,:), c(:,:)
  character( len=5 ) :: val
  integer :: n, l, m
  integer :: i, j, k
  integer :: t1, t2
  real(sp) :: tic
  !
  call get_command_argument( 1, val )
  read( val, *) n
  l = n
  m = n
  !
  call system_clock( t1, tic)
  !
  allocate( a(l,m), b(m,n), c(l,n) )
  !
  call random_number(a)
  call random_number(b)
  c = 0._sp
  !
  !$acc data copyin(a,b) copy(c)
  !$acc parallel loop collapse(3)
  !$omp target teams distribute parallel do collapse(3) map( to:a,b ) map(
tofrom:c )
  do j = 1, n
do k = 1, m
  do i = 1, l
c(i,j) = a(i,k)*b(k,j) + c(i,j)
  end do
end do
  end do
  !$acc end data
  !
  call system_clock(t2)
  print*, n, (t2-t1)/tic, sum(c)
  !
end program main

$ gfortran -O3 -fopenmp -foffload=nvptx-none matmul.f90 -o test.x
$ for i in {1..5}; do ./test.x $((512*2**$i)); done
1024  0.28788   268377424.
2048   7.4010E-02   0.
4096  0.17002   0.
8192  0.57401   0.
   16384   2.1049   0.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-12-12 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

  Known to fail|10.1.0  |10.2.0
   Keywords||openacc
Version|10.1.0  |10.2.0

--- Comment #7 from Chinoune  ---
with OpenACC, I got a similar message:

libgomp: cuStreamSynchronize error: the launch timed out and was terminated

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-12-12 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

 Resolution|WONTFIX |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #6 from Chinoune  ---
Reopen, as I have reproduced the same crash with another GPU.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-10-30 Thread mehdi.chinoune at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

Chinoune  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #5 from Chinoune  ---
Won't fix.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-05-21 Thread chinoune.mehdi at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

--- Comment #4 from Chinoune  ---
after some tests, It looks like it fails with only with small sizes.
The program doesn't crash when increasing matrices size. and It takes a shorter
time to execute!.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-05-15 Thread chinoune.mehdi at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

--- Comment #3 from Chinoune  ---
(In reply to Tobias Burnus from comment #1)
> * You compilation uses "-O0" – I do not know whether that's intended.
I didn't set any optimization flag, maybe the compiler default to "-O0".

>  
> * I did not see any timeout message although it did take a while to run
>   with offloading. (See timing results below.)
>   I wonder what causes the problem you are seeing.
> 
>   You could try whether setting the environment variable
> GOMP_DEBUG=1
>   shows some useful details for the launch.
> 
I have attached the output with GOMP_DEBUG=1

> * The OpenACC test case is wrong as "c" has to be "copy" not "copyout"
>   as the initial value is used (→ NaN)
Thanks, I did observe after I reported the bug.

I am using a Kepler (sm_35) Graphics card, if this helps.

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-05-15 Thread chinoune.mehdi at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

--- Comment #2 from Chinoune  ---
Created attachment 48546
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48546=edit
debug ouput

[Bug libgomp/95150] Some offloaded programs crash with openmp

2020-05-15 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

--- Comment #1 from Tobias Burnus  ---
* You compilation uses "-O0" – I do not know whether that's intended.

* I did not see any timeout message although it did take a while to run
  with offloading. (See timing results below.)
  I wonder what causes the problem you are seeing.

  You could try whether setting the environment variable
GOMP_DEBUG=1
  shows some useful details for the launch.

* The OpenACC test case is wrong as "c" has to be "copy" not "copyout"
  as the initial value is used (→ NaN)

On the technical side, at startup, one calls:
  cuLaunchKernel
and when that has succeeded, one calls
  cuCtxSynchronize
and if that fails, the error message is printed with
  cuda_error
which shows the time-out message:
  libgomp: cuCtxSynchronize error: the launch timed out and was terminated


I added a ", sum(c)" to the print output and did some tests:

On AMDGCN:
== -O0 == 3.5688   268048112.
== -Ofast ==  0.10999  268698816.
== -fopenmp -O0 ==  193.227997 268186448.
== -fopenmp -Ofast ==43.1559982268455872.
== -fopenacc -O0 == 186.399002 268531136.
== -fopenacc -Ofast ==   43.4970016268206464.
== -fopenmp -foffload=disable -O0 ==  7.27299976   268241776.
== -fopenmp -foffload=disable -Ofast ==   1.4901   268171680.


On NVidia:
== -O0 ==8.00599957268253520.
== -Ofast == 0.25495   268399056.
== -fopenmp -O0 ==  64.2089996 268092608.
== -fopenmp -Ofast ==   33.6360016 268359952.
== -fopenacc -O0 ==  0.86189 NaN (see note)
== -fopenacc -Ofast ==   0.30012 NaN (see note)
== -fopenmp -foffload=disable -O0 ==15.2220001 268511968.
== -fopenmp -foffload=disable -Ofast ==  3.5294268573568.
== -fopenacc -foffload=disable -O0 ==   14.5790005 268442496.
== -fopenacc -foffload=disable -Ofast == 4.41099977268511968.