Hi! Are we going to install such a work-around?
Grüße Thomas On 2022-12-19T13:04:43+0100, I wrote: > Hi! > > On 2022-12-16T17:19:00+0100, Tobias Burnus <tob...@codesourcery.com> wrote: >> Seems to be a CUDA JIT issue > > A Nvidia Driver JIT issue, more precisely. ;-) > >> which is fixed by adding a dummy procedure. > > Gah... :-| > >> Lightly tested with 4 systems at hand, where 2 failed before. > > I'm happy to confirm that indeed this does resolve the issue for all > configurations that I reported in <https://gcc.gnu.org/PR108098> > "OpenMP/nvptx reverse offload execution test FAILs". > > > As I said on IRC, #gcc, 2022-12-16: > >> [...] we're unlikely to reverse-engineer the exact version/conditions >> where this got fixed, so don't have a useful means for versioning the >> workaround. Fortunately, it doesn't "cost" anything really. (In >> constrast to some other GCC/nvptx back end workarounds, as I >> understand.) > > > Grüße > Thomas > > >> One had 10.2 and >> the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA >> version >> and requires -mptx=3.1. >> (I did check that offloading indeed happened and no hostfallback was done.) >> >> OK for mainline? >> >> Tobias > > >> nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098] >> >> Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global >> variables by NULL if a translation does not contain any executable code. It >> works with CUDA 11.1. The code of this commit is about reverse offload; >> having NULL values disables the side of reverse offload during image load. >> >> Solution is the same as found by Thomas for a related issue: Adding a dummy >> procedure. Cf. the PR of this issue and Thomas' patch >> "nvptx: Support global constructors/destructors via 'collect2'" >> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html >> >> As that approach also works here: >> >> Co-authored-by: Thomas Schwinge <tho...@codesourcery.com> >> >> gcc/ >> PR libgomp/108098 >> >> * config/nvptx/mkoffload.cc (process): Emit dummy procedure >> alongside reverse-offload function table to prevent NULL values >> of the function addresses. >> >> --- >> gcc/config/nvptx/mkoffload.cc | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc >> index 5d89ba8..8306aa0 100644 >> --- a/gcc/config/nvptx/mkoffload.cc >> +++ b/gcc/config/nvptx/mkoffload.cc >> @@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires) >> fputc (sm_ver2[i], out); >> fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n"); >> >> + /* WORKAROUND - see PR 108098 >> + It seems as if older CUDA JIT compiler optimizes the function pointers >> + in offload_func_table to NULL, which can be prevented by adding a >> + dummy procedure. With CUDA 11.1, it seems to work fine without >> + workaround while CUDA 10.2 as some ancient version have need the >> + workaround. Assuming CUDA 11.0 fixes it, emitting it could be >> + restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and >> + PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and >> + PTX ISA 7.1. */ >> + fprintf (out, "\n\t\".func __dummy$func ( );\"\n"); >> + fprintf (out, "\t\".func __dummy$func ( )\"\n"); >> + fprintf (out, "\t\"{\"\n"); >> + fprintf (out, "\t\"}\"\n"); >> + >> size_t fidx = 0; >> for (id = func_ids; id; id = id->next) >> { ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955