[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-04-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 4fd1510c2013fd975ac2ad94b3d201bcd5a9d029 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 127 insertions(+), 521 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module &M, GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h b/offlo

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits


@@ -34,38 +33,10 @@
 #include 
 #include 
 
-struct PluginManager;
-
-/// Plugin adaptors should be created via `PluginAdaptorTy::create` which will
-/// invoke the constructor and call `PluginAdaptorTy::init`. Eventual errors 
are
-/// reported back to the caller, otherwise a valid and initialized adaptor is
-/// returned.
-struct PluginAdaptorTy {
-  /// Try to create a plugin adaptor from a filename.
-  static llvm::Expected>
-  create(const std::string &Name);
-
-  /// Name of the shared object file representing the plugin.
-  std::string Name;
-
-  /// Access to the shared object file representing the plugin.
-  std::unique_ptr LibraryHandler;
-
-#define PLUGIN_API_HANDLE(NAME)
\
-  using NAME##_ty = decltype(__tgt_rtl_##NAME);
\
-  NAME##_ty *NAME = nullptr;
+#include "PluginInterface.h"

jdoerfert wrote:

Order the includes.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits


@@ -3476,3 +3472,9 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, 
TargetAllocTy Kind) {
 } // namespace target
 } // namespace omp
 } // namespace llvm
+
+extern "C" {
+llvm::omp::target::plugin::GenericPluginTy *createPlugin_amdgpu() {

jdoerfert wrote:

These should use the macro to create the name, one place to change.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

I left minor comments.

This LGTM, assuming it passes our tests.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits


@@ -92,10 +65,10 @@ struct PluginManager {
 std::make_unique(TgtBinDesc, TgtDeviceImage));
   }
 
-  /// Initialize as many devices as possible for this plugin adaptor. Devices
-  /// that fail to initialize are ignored. Returns the offset the devices were
-  /// registered at.
-  void initDevices(PluginAdaptorTy &RTL);
+  /// Initialize as many devices as possible for this plugin. Devices  that 
fail
+  /// to initialize are ignored. Returns the offset the devices were registered
+  /// at.

jdoerfert wrote:

No return value.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits


@@ -92,10 +65,10 @@ struct PluginManager {
 std::make_unique(TgtBinDesc, TgtDeviceImage));
   }
 
-  /// Initialize as many devices as possible for this plugin adaptor. Devices
-  /// that fail to initialize are ignored. Returns the offset the devices were
-  /// registered at.
-  void initDevices(PluginAdaptorTy &RTL);
+  /// Initialize as many devices as possible for this plugin. Devices  that 
fail

jdoerfert wrote:

2 spaces

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Johannes Doerfert via cfe-commits


@@ -34,38 +33,10 @@
 #include 
 #include 
 
-struct PluginManager;
-
-/// Plugin adaptors should be created via `PluginAdaptorTy::create` which will
-/// invoke the constructor and call `PluginAdaptorTy::init`. Eventual errors 
are
-/// reported back to the caller, otherwise a valid and initialized adaptor is
-/// returned.
-struct PluginAdaptorTy {
-  /// Try to create a plugin adaptor from a filename.
-  static llvm::Expected>
-  create(const std::string &Name);
-
-  /// Name of the shared object file representing the plugin.
-  std::string Name;
-
-  /// Access to the shared object file representing the plugin.
-  std::unique_ptr LibraryHandler;
-
-#define PLUGIN_API_HANDLE(NAME)
\
-  using NAME##_ty = decltype(__tgt_rtl_##NAME);
\
-  NAME##_ty *NAME = nullptr;
+#include "PluginInterface.h"
+using GenericPluginTy = llvm::omp::target::plugin::GenericPluginTy;
 
-#include "Shared/PluginAPI.inc"
-#undef PLUGIN_API_HANDLE
-
-  /// Create a plugin adaptor for filename \p Name with a dynamic library \p 
DL.
-  PluginAdaptorTy(const std::string &Name,
-  std::unique_ptr DL);
-
-  /// Initialize the plugin adaptor, this can fail in which case the adaptor is
-  /// useless.
-  llvm::Error init();
-};
+struct PluginManager;

jdoerfert wrote:

Is't it defined just 2 lines later?

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits


@@ -3476,3 +3472,9 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, 
TargetAllocTy Kind) {
 } // namespace target
 } // namespace omp
 } // namespace llvm
+
+extern "C" {
+llvm::omp::target::plugin::GenericPluginTy *createPlugin_amdgpu() {

jhuber6 wrote:

I tried that, but couldn't figure out how to set a name from a macro within a 
macro.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 473a4b9bad09bd9af8186932984be7696711692e Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 127 insertions(+), 521 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module &M, GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h b/offlo

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 8c4b7ffb49c8f91768af3bec00669bac5433ec0f Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 +-
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 126 insertions(+), 522 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module &M, GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h b/offlo

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-04-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 50099312ab7c60b8cfce5473a4c154f8d917dc41 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 15:27:16 -0500
Subject: [PATCH 1/3] [Libomptarget] Rename `libomptarget.rtl.x86_64` to
 `libomptarget.rtl.host`

Summary:
All of these are functionally the same code, just compiled for separate
architectures. We currently do not expose a way to execute these on
separate architectures as the host plugin works using `dlopen` into the
same process, and therefore cannot possibly be an incompatible
architecture. (This could work with a remote plugin, but this is not
supported yet).

This patch simply renames all of these to the same thing so we no longer
need to check around for its varying definitions.
---
 offload/plugins-nextgen/host/CMakeLists.txt | 32 ++---
 offload/src/CMakeLists.txt  |  5 +---
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/offload/plugins-nextgen/host/CMakeLists.txt 
b/offload/plugins-nextgen/host/CMakeLists.txt
index 7da18ee278d498..250ff2225c16bf 100644
--- a/offload/plugins-nextgen/host/CMakeLists.txt
+++ b/offload/plugins-nextgen/host/CMakeLists.txt
@@ -14,36 +14,36 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$")
 endif()
 
 # Create the library and add the default arguments.
-add_target_library(omptarget.rtl.${machine} ${machine})
+add_target_library(omptarget.rtl.host ${machine})
 
-target_sources(omptarget.rtl.${machine} PRIVATE src/rtl.cpp)
+target_sources(omptarget.rtl.host PRIVATE src/rtl.cpp)
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
   libomptarget_say("Building ${machine} plugin linked with libffi")
   if(FFI_STATIC_LIBRARIES)
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi_static)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi_static)
   else()
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi)
   endif()
 else()
   libomptarget_say("Building ${machine} plugin for dlopened libffi")
-  target_sources(omptarget.rtl.${machine} PRIVATE dynamic_ffi/ffi.cpp)
-  target_include_directories(omptarget.rtl.${machine} PRIVATE dynamic_ffi)
+  target_sources(omptarget.rtl.host PRIVATE dynamic_ffi/ffi.cpp)
+  target_include_directories(omptarget.rtl.host PRIVATE dynamic_ffi)
 endif()
 
 # Install plugin under the lib destination folder.
-install(TARGETS omptarget.rtl.${machine}
+install(TARGETS omptarget.rtl.host
 LIBRARY DESTINATION "${OFFLOAD_INSTALL_LIBDIR}")
-set_target_properties(omptarget.rtl.${machine} PROPERTIES
+set_target_properties(omptarget.rtl.host PROPERTIES
   INSTALL_RPATH "$ORIGIN" BUILD_RPATH "$ORIGIN:${CMAKE_CURRENT_BINARY_DIR}/.."
   POSITION_INDEPENDENT_CODE ON
   CXX_VISIBILITY_PRESET protected)
 
-target_include_directories(omptarget.rtl.${machine} PRIVATE
+target_include_directories(omptarget.rtl.host PRIVATE
${LIBOMPTARGET_INCLUDE_DIR})
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
-  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.${machine})
+  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.host)
   set(LIBOMPTARGET_TESTED_PLUGINS
   "${LIBOMPTARGET_TESTED_PLUGINS}" PARENT_SCOPE)
 else()
@@ -66,22 +66,22 @@ elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64$")
"powerpc64-ibm-linux-gnu" "powerpc64-ibm-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_X86_64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_X86_64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="x86_64-pc-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"x86_64-pc-linux-gnu" "x86_64-pc-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="aarch64-unknown-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"aarch64-unknown-linux-gnu" "aarch64-unknown-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "s390x$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_S390)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE TARGET_ELF_I

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Going to land this soon.

@jplehr @estewart08 @ronlieb Applied this on the AMD fork, here the diff.
 https://gist.github.com/jhuber6/e856fbe9c73acea13b6d30b20605c73e

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 3ea2ae0f5c438b38d0480cfb38a72d2f7a60142c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   4 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 -
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 125 insertions(+), 523 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module &M, GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h b/offloa

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread via cfe-commits

dhruvachak wrote:

There are duplicate definitions of the following
``
bool llvm::omp::target::ompt::Initialized = false;

ompt_get_callback_t llvm::omp::target::ompt::lookupCallbackByCode = nullptr;
ompt_function_lookup_t llvm::omp::target::ompt::lookupCallbackByName = nullptr;
``
in src/OpenMP/OMPT/Callback.cpp and plugins-nextgen/common/OMPT/OmptCallback.cpp

Can you remove the ones in the plugin? Otherwise, it's not clear which 
definition is being used.


https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> There are duplicate definitions of the following
> 
> ```
> bool llvm::omp::target::ompt::Initialized = false;
> 
> ompt_get_callback_t llvm::omp::target::ompt::lookupCallbackByCode = nullptr;
> ompt_function_lookup_t llvm::omp::target::ompt::lookupCallbackByName = 
> nullptr;
> ```
> 
> in src/OpenMP/OMPT/Callback.cpp and 
> plugins-nextgen/common/OMPT/OmptCallback.cpp
> 
> Can you remove the ones in the plugin? Otherwise, it's not clear which 
> definition is being used.

Sure, do you get that upstream? If so, wonder why I didn't get it.

I guess we would just have it to where the `libomptarget` instance handles all 
the OMPT.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread via cfe-commits

dhruvachak wrote:

> > There are duplicate definitions of the following
> > ```
> > bool llvm::omp::target::ompt::Initialized = false;
> > 
> > ompt_get_callback_t llvm::omp::target::ompt::lookupCallbackByCode = nullptr;
> > ompt_function_lookup_t llvm::omp::target::ompt::lookupCallbackByName = 
> > nullptr;
> > ```
> > 
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > in src/OpenMP/OMPT/Callback.cpp and 
> > plugins-nextgen/common/OMPT/OmptCallback.cpp
> > Can you remove the ones in the plugin? Otherwise, it's not clear which 
> > definition is being used.
> 
> Sure, do you get that upstream? If so, wonder why I didn't get it.

I did not build upstream but looking at downstream, I think for some reason 
they don't show up as duplicate symbols. But looking at the code, they should 
be removed. There are uses of those variables in the plugin, so there should be 
only 1 definition.
> 
> I guess we would just have it to where the `libomptarget` instance handles 
> all the OMPT.



https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I did not build upstream but looking at downstream, I think for some reason 
> they don't show up as duplicate symbols. But looking at the code, they should 
> be removed. There are uses of those variables in the plugin, so there should 
> be only 1 definition.

Does this apply for anything OMPT related inside of the plugin? There's a few 
places where we mark callbacks, but those can all be moved into `libomptarget` 
once this lands.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-06 Thread via cfe-commits

dhruvachak wrote:

I think OmptCallback.cpp can be completely removed from the plugin since that 
functionality is subsumed by offload. As in 
```
diff --git a/offload/plugins-nextgen/common/CMakeLists.txt 
b/offload/plugins-nextgen/common/CMakeLists.txt
index acf0af63f050..dc942db826bf 100644
--- a/offload/plugins-nextgen/common/CMakeLists.txt
+++ b/offload/plugins-nextgen/common/CMakeLists.txt
@@ -46,7 +46,6 @@ endif()
 
 # If we have OMPT enabled include it in the list of sources.
 if (OMPT_TARGET_DEFAULT AND LIBOMPTARGET_OMPT_SUPPORT)
-  target_sources(PluginCommon PRIVATE OMPT/OmptCallback.cpp)
   target_include_directories(PluginCommon PRIVATE OMPT)
 endif()
```

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-07 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From 6dfa6dc2956ca714e98bf24b176315da42446553 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 28 Mar 2024 16:18:19 -0500
Subject: [PATCH] [Libomptarget] Statically link all plugin runtimes

Summary:
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868
---
 clang/test/Driver/linker-wrapper-image.c  |   2 +-
 .../Frontend/Offloading/OffloadWrapper.cpp|   7 +-
 offload/include/PluginManager.h   |  61 ++
 offload/include/device.h  |   8 +-
 offload/plugins-nextgen/CMakeLists.txt|  19 +-
 offload/plugins-nextgen/amdgpu/CMakeLists.txt |   5 -
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  14 +-
 offload/plugins-nextgen/common/CMakeLists.txt |   5 +-
 .../common/include/PluginInterface.h  |  94 +---
 .../common/include/Utils/ELF.h|   2 -
 offload/plugins-nextgen/common/src/JIT.cpp|  40 ++--
 .../common/src/PluginInterface.cpp| 205 --
 offload/plugins-nextgen/cuda/CMakeLists.txt   |   5 -
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  14 +-
 offload/plugins-nextgen/host/CMakeLists.txt   |   8 -
 offload/plugins-nextgen/host/src/rtl.cpp  |  14 +-
 offload/src/CMakeLists.txt|   4 +
 offload/src/OffloadRTL.cpp|   1 +
 offload/src/OpenMP/InteropAPI.cpp |   4 +-
 offload/src/PluginManager.cpp | 129 ---
 offload/src/device.cpp|   3 +-
 offload/src/interface.cpp |   2 -
 .../kernelreplay/llvm-omp-kernel-replay.cpp   |   2 -
 .../unittests/Plugins/NextgenPluginsTest.cpp  |   1 -
 24 files changed, 125 insertions(+), 524 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index d01445e3aed04e..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -30,8 +30,8 @@
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
-// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 7241d15ed1c670..8b6f9ea1f4cca3 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -232,12 +232,13 @@ void createRegisterFunction(Module &M, GlobalVariable 
*BinDesc,
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
 
+  Builder.CreateCall(RegFuncC, BinDesc);
+
   // Register the destructors with 'atexit'. This is expected by the CUDA
   // runtime and ensures that we clean up before dynamic objects are destroyed.
-  // This needs to be done before the runtime is called and registers its own.
+  // This needs to be done after plugin initialization to ensure that it is
+  // called before the plugin runtime is destroyed.
   Builder.CreateCall(AtExit, UnregFunc);
-
-  Builder.CreateCall(RegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
   // Add this function to constructors.
diff --git a/offload/include/PluginManager.h b/offloa

[clang] [llvm] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-05-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits