[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Resolution|--- |WONTFIX Status|UNCONFIRMED |RESOLVED --- Comment #10 from Chinoune --- No one has the intention to fix it.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|WONTFIX |--- --- Comment #9 from Chinoune --- I get it with more examples.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #8 from Chinoune --- Adding "parallel do" to openmp directive solves the problem. The crash reappears with "collapse(2)" with both OpenMP and OpenACC. program main implicit none integer, parameter :: sp = selected_real_kind(6,37) real(sp), allocatable :: a(:,:), b(:,:), c(:,:) character( len=5 ) :: val integer :: n, l, m integer :: i, j, k integer :: t1, t2 real(sp) :: tic ! call get_command_argument( 1, val ) read( val, *) n l = n m = n ! call system_clock( t1, tic) ! allocate( a(l,m), b(m,n), c(l,n) ) ! call random_number(a) call random_number(b) c = 0._sp ! !$acc data copyin(a,b) copy(c) !$acc parallel loop collapse(3) !$omp target teams distribute parallel do collapse(3) map( to:a,b ) map( tofrom:c ) do j = 1, n do k = 1, m do i = 1, l c(i,j) = a(i,k)*b(k,j) + c(i,j) end do end do end do !$acc end data ! call system_clock(t2) print*, n, (t2-t1)/tic, sum(c) ! end program main $ gfortran -O3 -fopenmp -foffload=nvptx-none matmul.f90 -o test.x $ for i in {1..5}; do ./test.x $((512*2**$i)); done 1024 0.28788 268377424. 2048 7.4010E-02 0. 4096 0.17002 0. 8192 0.57401 0. 16384 2.1049 0.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Known to fail|10.1.0 |10.2.0 Keywords||openacc Version|10.1.0 |10.2.0 --- Comment #7 from Chinoune --- with OpenACC, I got a similar message: libgomp: cuStreamSynchronize error: the launch timed out and was terminated
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Resolution|WONTFIX |--- Status|RESOLVED|UNCONFIRMED --- Comment #6 from Chinoune --- Reopen, as I have reproduced the same crash with another GPU.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 Chinoune changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #5 from Chinoune --- Won't fix.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 --- Comment #4 from Chinoune --- after some tests, It looks like it fails with only with small sizes. The program doesn't crash when increasing matrices size. and It takes a shorter time to execute!.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 --- Comment #3 from Chinoune --- (In reply to Tobias Burnus from comment #1) > * You compilation uses "-O0" – I do not know whether that's intended. I didn't set any optimization flag, maybe the compiler default to "-O0". > > * I did not see any timeout message although it did take a while to run > with offloading. (See timing results below.) > I wonder what causes the problem you are seeing. > > You could try whether setting the environment variable > GOMP_DEBUG=1 > shows some useful details for the launch. > I have attached the output with GOMP_DEBUG=1 > * The OpenACC test case is wrong as "c" has to be "copy" not "copyout" > as the initial value is used (→ NaN) Thanks, I did observe after I reported the bug. I am using a Kepler (sm_35) Graphics card, if this helps.
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 --- Comment #2 from Chinoune --- Created attachment 48546 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48546=edit debug ouput
[Bug libgomp/95150] Some offloaded programs crash with openmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150 --- Comment #1 from Tobias Burnus --- * You compilation uses "-O0" – I do not know whether that's intended. * I did not see any timeout message although it did take a while to run with offloading. (See timing results below.) I wonder what causes the problem you are seeing. You could try whether setting the environment variable GOMP_DEBUG=1 shows some useful details for the launch. * The OpenACC test case is wrong as "c" has to be "copy" not "copyout" as the initial value is used (→ NaN) On the technical side, at startup, one calls: cuLaunchKernel and when that has succeeded, one calls cuCtxSynchronize and if that fails, the error message is printed with cuda_error which shows the time-out message: libgomp: cuCtxSynchronize error: the launch timed out and was terminated I added a ", sum(c)" to the print output and did some tests: On AMDGCN: == -O0 == 3.5688 268048112. == -Ofast == 0.10999 268698816. == -fopenmp -O0 == 193.227997 268186448. == -fopenmp -Ofast ==43.1559982268455872. == -fopenacc -O0 == 186.399002 268531136. == -fopenacc -Ofast == 43.4970016268206464. == -fopenmp -foffload=disable -O0 == 7.27299976 268241776. == -fopenmp -foffload=disable -Ofast == 1.4901 268171680. On NVidia: == -O0 ==8.00599957268253520. == -Ofast == 0.25495 268399056. == -fopenmp -O0 == 64.2089996 268092608. == -fopenmp -Ofast == 33.6360016 268359952. == -fopenacc -O0 == 0.86189 NaN (see note) == -fopenacc -Ofast == 0.30012 NaN (see note) == -fopenmp -foffload=disable -O0 ==15.2220001 268511968. == -fopenmp -foffload=disable -Ofast == 3.5294268573568. == -fopenacc -foffload=disable -O0 == 14.5790005 268442496. == -fopenacc -foffload=disable -Ofast == 4.41099977268511968.