https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118518
Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also|https://gcc.gnu.org/bugzill |
|a/show_bug.cgi?id=106445, |
|https://gcc.gnu.org/bugzill |
|a/show_bug.cgi?id=105019 |
Last reconfirmed| |2025-03-26
Status|UNCONFIRMED |NEW
Keywords| |ice-on-valid-code, openacc
Depends on| |89499, 106445, 117010
CC| |tschwinge at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #12 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Thanks for your submission; I'm working through this and your other ones.
(In reply to Benjamin Schulz from comment #11)
> if i write something like this:
> SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenacc -foffload=nvptx-none
> -foffload=-malias -fcf-protection=none -fno-stack-protector
> -U_FORTIFY_SOURCE -std=c++23 -no-pie")
>
> it still complains that alias definitions are not supported.
PTX '.alias' is available for PTX 6.3+, which GCC 14 doesn't default to, so
you'll need '-foffload-options=nvptx-none=-mptx=6.3' in addition to
'-foffload-options=nvptx-none=-malias'.
I'm working on (the upcoming) GCC 15. Current status for nvptx offloading for
your code per the 2025-02-04 attachments (with the MPI things disabled):
With '-std=c++23 -fopenacc -O0', we run into missing undeclared/missing symbols
during nvptx offloading compilation:
ptxas ./a.xnvptx-none.mkoffload.o, line 1543; error : Call to
'_ZSt3powImdEN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_'
requires call prototype
ptxas ./a.xnvptx-none.mkoffload.o, line 2561; error : Call to
'_ZN10datastructIdED1Ev' requires call prototype
ptxas ./a.xnvptx-none.mkoffload.o, line 2568; error : Call to
'_ZN10datastructIdED1Ev' requires call prototype
ptxas ./a.xnvptx-none.mkoffload.o, line 2575; error : Call to
'_ZN10datastructIdED1Ev' requires call prototype
ptxas ./a.xnvptx-none.mkoffload.o, line 3335; error : Call to
'_ZN10datastructIdED1Ev' requires call prototype
ptxas ./a.xnvptx-none.mkoffload.o, line 1543; error : Unknown symbol
'_ZSt3powImdEN9__gnu_cxx11__promote_2IDTplcvNS1_IT_XsrSt12__is_integerIS2_E7__valueEE6__typeELi0EcvNS1_IT0_XsrS3_IS7_E7__valueEE6__typeELi0EEXsrS3_ISB_E7__valueEE6__typeES2_S7_'
ptxas ./a.xnvptx-none.mkoffload.o, line 2561; error : Unknown symbol
'_ZN10datastructIdED1Ev'
ptxas ./a.xnvptx-none.mkoffload.o, line 2568; error : Unknown symbol
'_ZN10datastructIdED1Ev'
ptxas ./a.xnvptx-none.mkoffload.o, line 2575; error : Unknown symbol
'_ZN10datastructIdED1Ev'
ptxas ./a.xnvptx-none.mkoffload.o, line 3335; error : Unknown symbol
'_ZN10datastructIdED1Ev'
[...]
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
The first 'error' is what I just filed as PR119485 "OpenACC offloading
compilation failure/ICE for C++ templated library functions".
The following 'error's, C++ destructors, that's very likely the issue already
reported/discussed in PR106445 "nvptx offloading: C++ constructor symbol alias
getting lost", PR117010 "[nvptx] Incorrect ptx code-gen for C++ code with
templates", which I'm looking into.
With '-std=c++23 -fopenacc -O1', we run into PR89499 "ICE in expand_UNIQUE, at
internal-fn.c:2605", which I need to resolve... Therefore, add '-fno-inline'.
However, with '-std=c++23 -fopenacc -O1 -fno-inline', GCC then again ICEs
during nvptx offloading compilation as discussed in PR119485 "OpenACC
offloading compilation failure/ICE for C++ templated library functions".
Thus, replace 'pow([...])' with 'powf([...])'. With this, compilation succeeds
(within the bounds set above), and we get Nvidia GPU execution as follows:
$ ./a.out
Ordinary matrix multiplication, on gpu
80 90 100 110
176 202 228 254
272 314 356 398
368 426 484 542
A Cholesky decomposition with the multiplication on gpu
4 12 -16
12 37 -43
-16 -43 98
2 0 0
6 1 0
-8 5 3
Now the cholesky decomposition is entirely done on gpu
2 0 0
6 1 0
-8 5 3
Now we do the same with the lu decomposition
1 -2 -2 -3
3 -9 0 -9
-1 2 4 7
-3 -6 26 2
Just the multiplication on gpu
1 0 0 0
3 1 0 0
-1 -0 1 0
-3 4 -2 1
1 -2 -2 -3
0 -3 6 0
0 0 2 4
0 0 0 1
Entirely on gpu
1 0 0 0
3 1 0 0
-1 -0 1 0
-3 4 -2 1
1 -2 -2 -3
0 -3 6 0
0 0 2 4
0 0 0 1
Now we do the same with the qr decomposition
12 -51 4
6 167 -68
-4 24 -41
Just the multiplication on gpu
0.857143 -0.394286 -0.331429
0.428571 0.902857 0.0342857
-0.285714 0.171429 -0.942857
14 21 -14
-8.88178e-16 175 -70
-2.63678e-15 -5.06262e-14 35
Entirely on gpu
libgomp: cuStreamSynchronize error: an illegal memory access was
encountered
(I've not checked these numbers, and not looked into that device-side SIGSEGV.)
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89499
[Bug 89499] [12/13/14/15 Regression] ICE in expand_UNIQUE, at
internal-fn.c:2605
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106445
[Bug 106445] nvptx offloading: C++ constructor symbol alias getting lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117010
[Bug 117010] [nvptx] Incorrect ptx code-gen for C++ code with templates