On 4/4/23 11:02, Thomas Schwinge wrote:
Hi!

Are we going to install such a work-around?


Hi,

LGTM.

Thanks,
- Tom


Grüße
  Thomas


On 2022-12-19T13:04:43+0100, I wrote:
Hi!

On 2022-12-16T17:19:00+0100, Tobias Burnus <tob...@codesourcery.com> wrote:
Seems to be a CUDA JIT issue

A Nvidia Driver JIT issue, more precisely.  ;-)

which is fixed by adding a dummy procedure.

Gah...  :-|

Lightly tested with 4 systems at hand, where 2 failed before.

I'm happy to confirm that indeed this does resolve the issue for all
configurations that I reported in <https://gcc.gnu.org/PR108098>
"OpenMP/nvptx reverse offload execution test FAILs".


As I said on IRC, #gcc, 2022-12-16:

[...] we're unlikely to reverse-engineer the exact version/conditions
where this got fixed, so don't have a useful means for versioning the
workaround.  Fortunately, it doesn't "cost" anything really.  (In
constrast to some other GCC/nvptx back end workarounds, as I
understand.)


Grüße
  Thomas


One had 10.2 and
the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version
and requires -mptx=3.1.
(I did check that offloading indeed happened and no hostfallback was done.)

OK for mainline?

Tobias


nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge <tho...@codesourcery.com>

gcc/
      PR libgomp/108098

      * config/nvptx/mkoffload.cc (process): Emit dummy procedure
      alongside reverse-offload function table to prevent NULL values
      of the function addresses.

---
  gcc/config/nvptx/mkoffload.cc | 14 ++++++++++++++
  1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 5d89ba8..8306aa0 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
      fputc (sm_ver2[i], out);
        fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");

+      /* WORKAROUND - see PR 108098
+     It seems as if older CUDA JIT compiler optimizes the function pointers
+     in offload_func_table to NULL, which can be prevented by adding a
+     dummy procedure. With CUDA 11.1, it seems to work fine without
+     workaround while CUDA 10.2 as some ancient version have need the
+     workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+     restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+     PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+     PTX ISA 7.1.  */
+      fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+      fprintf (out, "\t\".func __dummy$func ( )\"\n");
+      fprintf (out, "\t\"{\"\n");
+      fprintf (out, "\t\"}\"\n");
+
        size_t fidx = 0;
        for (id = func_ids; id; id = id->next)
      {
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to