[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-17 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Tested this one a few machines and it works as expected after some final tweaks.

FYI @jplehr and @Artem-B, this will change the CMake configuration required for 
building and testing on the build bots. The new expected way to test each one 
respectively would be the following
```
-DLLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda
ninja check-libc-amdgcn-amd-amdhsa
ninja check-libc-nvptx64-nvidia-cuda
```
I think specifically there are two CMake issues that might make this annoying. 
I would like to be able to just do `ninja check-libc` and have it check all of 
them, but for the bots it's likely enough to just check specifically which one 
we're interested in.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From 85f7218baa72307699b48bffa3da4005597ec719 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  | 104 ++
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  67 ++--
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  42 ++-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   6 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   8 +
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  10 +-
 openmp/libomptarget/CMakeLists.txt|   9 +-
 .../plugins-nextgen/common/CMakeLists.txt |  

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From 8727a9631480deac9d9df386ed26dfcd35914a13 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  | 105 ++
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  68 ++--
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  42 ++-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   6 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   8 +
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  10 +-
 openmp/libomptarget/CMakeLists.txt|   9 +-
 .../plugins-nextgen/common/CMakeLists.txt |  

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From f3013086f60f2a78c12887cf1736455e8fb1911b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  | 105 ++
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  68 ++--
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  42 ++-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   6 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   8 +
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  10 +-
 openmp/libomptarget/CMakeLists.txt|   9 +-
 .../plugins-nextgen/common/CMakeLists.txt |  

[clang] [Clang][NVPTX] Allow passing arguments to the linker while standalone (PR #73030)

2024-02-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/73030

>From ee43e8f9ae90bcd70d46b17cfecb854711a4b1ce Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 21 Nov 2023 13:45:10 -0600
Subject: [PATCH] [Clang][NVPTX] Allow passing arguments to the linker while
 standalone

Summary:
We support standalone compilation for the NVPTX architecture using
'nvlink' as our linker. Because of the special handling required to
transform input files to cubins, as nvlink expects for some reason, we
didn't use the standard AddLinkerInput method. However, this also meant
that we weren't forwarding options passed with -Wl to the linker. Add
this support in for the standalone toolchain path.

Revived from https://reviews.llvm.org/D149978
---
 clang/lib/Driver/ToolChains/Cuda.cpp  | 43 +--
 clang/test/Driver/cuda-cross-compiling.c  |  8 
 .../ClangLinkerWrapper.cpp|  4 +-
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index e95ff98e6c940f..5ef8b4455c23f1 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -611,35 +611,34 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
   continue;
 }
 
-// Currently, we only pass the input files to the linker, we do not pass
-// any libraries that may be valid only for the host.
-if (!II.isFilename())
-  continue;
-
 // The 'nvlink' application performs RDC-mode linking when given a '.o'
 // file and device linking when given a '.cubin' file. We always want to
 // perform device linking, so just rename any '.o' files.
 // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-auto InputFile = getToolChain().getInputFilename(II);
-if (llvm::sys::path::extension(InputFile) != ".cubin") {
-  // If there are no actions above this one then this is direct input and 
we
-  // can copy it. Otherwise the input is internal so a `.cubin` file should
-  // exist.
-  if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-const char *CubinF =
-Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-llvm::sys::path::stem(InputFile), "cubin"));
-if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-  continue;
+if (II.isFilename()) {
+  auto InputFile = getToolChain().getInputFilename(II);
+  if (llvm::sys::path::extension(InputFile) != ".cubin") {
+// If there are no actions above this one then this is direct input and
+// we can copy it. Otherwise the input is internal so a `.cubin` file
+// should exist.
+if (II.getAction() && II.getAction()->getInputs().size() == 0) {
+  const char *CubinF =
+  Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
+  llvm::sys::path::stem(InputFile), "cubin"));
+  if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
+continue;
 
-CmdArgs.push_back(CubinF);
+  CmdArgs.push_back(CubinF);
+} else {
+  SmallString<256> Filename(InputFile);
+  llvm::sys::path::replace_extension(Filename, "cubin");
+  CmdArgs.push_back(Args.MakeArgString(Filename));
+}
   } else {
-SmallString<256> Filename(InputFile);
-llvm::sys::path::replace_extension(Filename, "cubin");
-CmdArgs.push_back(Args.MakeArgString(Filename));
+CmdArgs.push_back(Args.MakeArgString(InputFile));
   }
-} else {
-  CmdArgs.push_back(Args.MakeArgString(InputFile));
+} else if (!II.isNothing()) {
+  II.getInputArg().renderAsInput(Args, CmdArgs);
 }
   }
 
diff --git a/clang/test/Driver/cuda-cross-compiling.c 
b/clang/test/Driver/cuda-cross-compiling.c
index 12d0af3b45f32f..5a52496838813e 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -77,3 +77,11 @@
 // RUN:   | FileCheck -check-prefix=LOWERING %s
 
 // LOWERING: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-mllvm" 
"--nvptx-lower-global-ctor-dtor"
+
+//
+// Test passing arguments directly to nvlink.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
+
+// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index bafe8ace60d1ce..03fb0a7d64552e 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -385,9 +385,11 @@ Expected clang(ArrayRef InputFiles, 
const ArgList &Args) {
   Triple.isAMDGPU() ? Args.MakeArgString("-mcpu=" + Arch)
 : Args.MakeArgString("-march=" + Arch),
  

[clang] [Clang][NVPTX] Allow passing arguments to the linker while standalone (PR #73030)

2024-02-19 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Ping, once https://github.com/llvm/llvm-project/pull/81921 lands this patch 
won't cause any issues with the `libc` build like it does currently so I'd like 
to land this afterwards.

https://github.com/llvm/llvm-project/pull/73030
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Joseph Huber via cfe-commits


@@ -50,31 +50,9 @@ function(collect_object_file_deps target result)
   endif()
 endfunction(collect_object_file_deps)
 
-# A rule to build a library from a collection of entrypoint objects.
-# Usage:
-# add_entrypoint_library(
-#   DEPENDS 
-# )
-#
-# NOTE: If one wants an entrypoint to be available in a library, then they will
-# have to list the entrypoint target explicitly in the DEPENDS list. Implicit
-# entrypoint dependencies will not be added to the library.
-function(add_entrypoint_library target_name)
-  cmake_parse_arguments(
-"ENTRYPOINT_LIBRARY"
-"" # No optional arguments
-"" # No single value arguments
-"DEPENDS" # Multi-value arguments
-${ARGN}
-  )
-  if(NOT ENTRYPOINT_LIBRARY_DEPENDS)
-message(FATAL_ERROR "'add_entrypoint_library' target requires a DEPENDS 
list "
-"of 'add_entrypoint_object' targets.")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ENTRYPOINT_LIBRARY_DEPENDS})
+function(get_all_object_file_deps result fq_deps_list)

jhuber6 wrote:

It used to be simpler, but I had to rebase it on top of other changes in the 
`libc`. I could potentially make a simpler patch beforehand.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-20 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From a396fe930db6c3fb20dc4f7918736e54d21cb24b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  |  76 +---
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  47 ++-
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  42 ++-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   2 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   8 +
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  10 +-
 openmp/libomptarget/CMakeLists.txt|   9 +-
 .../plugins-nextgen/common/CMakeLists.txt |   6 

[clang] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-03-12 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Do you mean the SPIR-V target (backend)? I have not followed this area of 
> work closely. What is missing or what exactly needs to be supported by the 
> SPIR-V target? Any help or pointers would be greatly appreciated!

I believe there was some work to port SYCL to work with the new driver, however 
I don't know that status of that. However what I need from `SPIR-V` is a target 
in clang that enables the SPIR-V toolchain. That is, if I do `clang 
--target=spriv-something-something foo.c` it will spit out some valid SPIR-V. 
This is because the `clang-linker-wrapper` internally uses this to invoke the 
device linker without duplicating a whole lot of logic. E.g. `clang 
--target=nvptx64-nvidia-cuda -march=sm_89 foo.o bar.o` will invoke `nvlink` to 
create an output `cubin` file.

https://github.com/llvm/llvm-project/pull/84420
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-14 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-14 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 requested changes to this pull request.

Thanks for looking at this. When the user compiles with `-march=xyz` it 
introduces a lot of subtarget specific metadata intro the output IR. The 
purpose of the original patch was to keep `-target-cpu` unset in cases where 
`-march=xyz` was not passed in. The expected semantics here is that 
`-march=sm_52 -march=generic` will override `-march=sm_52` and result in no 
`-target-cpu` being set just like if you didn't pass `-march` at all.

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-14 Thread Joseph Huber via cfe-commits


@@ -85,6 +90,6 @@
 // MISSING: error: Must pass in an explicit nvptx64 gpu architecture to 
'nvlink'
 
 // RUN: %clang -target nvptx64-nvidia-cuda -flto -c %s -### 2>&1 \
-// RUN:   | FileCheck -check-prefix=GENERIC %s
+// RUN:   | FileCheck -check-prefix=COMPILE %s
 

jhuber6 wrote:

```suggestion
// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_52 -march=generic -flto -c 
%s -### 2>&1 \
// RUN:   | FileCheck -check-prefix=GENERIC %s
```
The test should look like this, using `-march=generic` overrides the previous 
`-march` and results in the same output as if `-march` was not passed at all.

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-14 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

FWIW I think you can kind of do this with `-march=sm_52 -march=` to just set it 
to empty.

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-15 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

LG, thanks for the patch.

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-15 Thread Joseph Huber via cfe-commits


@@ -750,10 +750,11 @@ NVPTXToolChain::TranslateArgs(const 
llvm::opt::DerivedArgList &Args,
 if (!llvm::is_contained(*DAL, A))
   DAL->append(A);
 
-  // TODO: We should accept 'generic' as a valid architecture.
   if (!DAL->hasArg(options::OPT_march_EQ) && OffloadKind != Action::OFK_None) {
 DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
   CudaArchToString(CudaArch::CudaDefault));
+  } else if (DAL->getLastArgValue(options::OPT_march_EQ) == "generic") {

jhuber6 wrote:

```suggestion
  } else if (DAL->getLastArgValue(options::OPT_march_EQ) == "generic"
 && OffloadKind == Action::OFK_None) {
```
Ah, forgot, we probably don't want to expose this to CUDA just yet. 

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-15 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-15 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Thanks, I'll merge it once it passes CI.

https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [docs] Prefer --gcc-install-dir= to deprecated GCC_INSTALL_PREFIX (PR #85458)

2024-03-15 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/85458
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Add `-march=general` option to mirror default configuration (PR #85222)

2024-03-15 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/85222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 280c7a9 - [Clang] Fix preprocessing device only in HIP mode

2024-03-18 Thread Joseph Huber via cfe-commits

Author: Joseph Huber
Date: 2024-03-18T12:12:17-05:00
New Revision: 280c7a9526a9ae7f959117c9cec94f8c8887f15c

URL: 
https://github.com/llvm/llvm-project/commit/280c7a9526a9ae7f959117c9cec94f8c8887f15c
DIFF: 
https://github.com/llvm/llvm-project/commit/280c7a9526a9ae7f959117c9cec94f8c8887f15c.diff

LOG: [Clang] Fix preprocessing device only in HIP mode

Summary:
A recent change made the HIP compilation bundle by default. However we
don't want to do this for `-E`, which silently broke some handling.

Added: 


Modified: 
clang/lib/Driver/Driver.cpp

Removed: 




diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 5015ce9f6d68e0..1daf588142b3b4 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4647,7 +4647,8 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
 
   // All kinds exit now in device-only mode except for non-RDC mode HIP.
   if (offloadDeviceOnly() &&
-  (!C.isOffloadingHostKind(Action::OFK_HIP) ||
+  (getFinalPhase(Args) == phases::Preprocess ||
+   !C.isOffloadingHostKind(Action::OFK_HIP) ||
!Args.hasFlag(options::OPT_gpu_bundle_output,
  options::OPT_no_gpu_bundle_output, true) ||
Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)))



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] do not link runtime for -r (PR #85675)

2024-03-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/85675
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][CodeGen] Omit pre-opt link when post-opt is link requested (PR #85672)

2024-03-19 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

This means that there won't be any optimizations on these definitions, correct? 
Likely not ideal  to have no inlining even if it saves compilation time.

This "post-op" linking is only required because we emit calls to functions that 
don't exist in the module. The way we solved this in OpenMP is by always 
providing the library at the link step and making the optimization passes not 
emit new calls if we are in "post-link LTO" as determined by module flags.

We could theoretically just force all AMDGPU compilation to go through the LTO 
pass, something like `ld.lld --start-lib ockl.bc ocml.bc --end-lib`. That would 
have the effect of fixing-up any missing definitions as it will only extract if 
needed. Problem is that the device libs have protected visibility until the 
next release cycle so this won't actually internalize.

https://github.com/llvm/llvm-project/pull/85672
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Correctly omit bundling with the new driver (PR #85842)

2024-03-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/85842

Summary:
The HIP phases do not emit the offload bundler output when we do not
invoke the final linker phase in device only mode. Check this propery.


>From 5cb265bfa23d7d697499544bddaa20d314eb1efc Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 19 Mar 2024 13:29:35 -0500
Subject: [PATCH] [HIP] Correctly omit bundling with the new driver

Summary:
The HIP phases do not emit the offload bundler output when we do not
invoke the final linker phase in device only mode. Check this propery.
---
 clang/lib/Driver/Driver.cpp  | 17 +++--
 clang/test/Driver/hip-phases.hip | 12 
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 1daf588142b3b4..e7d57635e03208 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4641,17 +4641,22 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   DDep.add(*Input, *TCAndArch->first, TCAndArch->second.data(), Kind);
   OffloadActions.push_back(C.MakeAction(DDep, 
A->getType()));
 
+
   ++TCAndArch;
 }
   }
 
+  // HIP code in non-RDC mode will bundle the output if it invoked the linker.
+  bool ShouldBundleHIP =
+  C.isOffloadingHostKind(Action::OFK_HIP) &&
+  Args.hasFlag(options::OPT_gpu_bundle_output,
+   options::OPT_no_gpu_bundle_output, true) &&
+  !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) &&
+  !llvm::any_of(OffloadActions,
+[](Action *A) { return A->getType() != types::TY_Image; });
+
   // All kinds exit now in device-only mode except for non-RDC mode HIP.
-  if (offloadDeviceOnly() &&
-  (getFinalPhase(Args) == phases::Preprocess ||
-   !C.isOffloadingHostKind(Action::OFK_HIP) ||
-   !Args.hasFlag(options::OPT_gpu_bundle_output,
- options::OPT_no_gpu_bundle_output, true) ||
-   Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)))
+  if (offloadDeviceOnly() && !ShouldBundleHIP)
 return C.MakeAction(DDeps, types::TY_Nothing);
 
   if (OffloadActions.empty())
diff --git a/clang/test/Driver/hip-phases.hip b/clang/test/Driver/hip-phases.hip
index ca63d4304d3959..9be6a5577cbdc7 100644
--- a/clang/test/Driver/hip-phases.hip
+++ b/clang/test/Driver/hip-phases.hip
@@ -648,3 +648,15 @@
 // LTO-NEXT: 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, 
"device-hip (x86_64-unknown-linux-gnu)" {13}, ir
 // LTO-NEXT: 15: backend, {14}, assembler, (host-hip)
 // LTO-NEXT: 16: assembler, {15}, object, (host-hip)
+
+//
+// Test the new driver when not bundling
+//
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-new-driver 
-ccc-print-phases \
+// RUN:--offload-device-only --offload-arch=gfx90a -emit-llvm -c %s 
2>&1 \
+// RUN: | FileCheck -check-prefix=DEVICE-ONLY %s
+//  DEVICE-ONLY: 0: input, "[[INPUT:.+]]", hip, (device-hip, gfx1030)
+// DEVICE-ONLY-NEXT: 1: preprocessor, {0}, hip-cpp-output, (device-hip, 
gfx1030)
+// DEVICE-ONLY-NEXT: 2: compiler, {1}, ir, (device-hip, gfx1030)
+// DEVICE-ONLY-NEXT: 3: backend, {2}, ir, (device-hip, gfx1030)
+// DEVICE-ONLY-NEXT: 4: offload, "device-hip (amdgcn-amd-amdhsa:gfx1030)" {3}, 
none

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Correctly omit bundling with the new driver (PR #85842)

2024-03-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/85842

>From d920ec18c3133dba59149c756a91f7d459040435 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 19 Mar 2024 13:29:35 -0500
Subject: [PATCH] [HIP] Correctly omit bundling with the new driver

Summary:
The HIP phases do not emit the offload bundler output when we do not
invoke the final linker phase in device only mode. Check this propery.
---
 clang/lib/Driver/Driver.cpp  | 16 ++--
 clang/test/Driver/hip-phases.hip | 12 
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 1daf588142b3b4..767c1cd47e8cd9 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4645,13 +4645,17 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
 }
   }
 
+  // HIP code in non-RDC mode will bundle the output if it invoked the linker.
+  bool ShouldBundleHIP =
+  C.isOffloadingHostKind(Action::OFK_HIP) &&
+  Args.hasFlag(options::OPT_gpu_bundle_output,
+   options::OPT_no_gpu_bundle_output, true) &&
+  !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) &&
+  !llvm::any_of(OffloadActions,
+[](Action *A) { return A->getType() != types::TY_Image; });
+
   // All kinds exit now in device-only mode except for non-RDC mode HIP.
-  if (offloadDeviceOnly() &&
-  (getFinalPhase(Args) == phases::Preprocess ||
-   !C.isOffloadingHostKind(Action::OFK_HIP) ||
-   !Args.hasFlag(options::OPT_gpu_bundle_output,
- options::OPT_no_gpu_bundle_output, true) ||
-   Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)))
+  if (offloadDeviceOnly() && !ShouldBundleHIP)
 return C.MakeAction(DDeps, types::TY_Nothing);
 
   if (OffloadActions.empty())
diff --git a/clang/test/Driver/hip-phases.hip b/clang/test/Driver/hip-phases.hip
index ca63d4304d3959..180ef43022f818 100644
--- a/clang/test/Driver/hip-phases.hip
+++ b/clang/test/Driver/hip-phases.hip
@@ -648,3 +648,15 @@
 // LTO-NEXT: 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, 
"device-hip (x86_64-unknown-linux-gnu)" {13}, ir
 // LTO-NEXT: 15: backend, {14}, assembler, (host-hip)
 // LTO-NEXT: 16: assembler, {15}, object, (host-hip)
+
+//
+// Test the new driver when not bundling
+//
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-new-driver 
-ccc-print-phases \
+// RUN:--offload-device-only --offload-arch=gfx90a -emit-llvm -c %s 
2>&1 \
+// RUN: | FileCheck -check-prefix=DEVICE-ONLY %s
+//  DEVICE-ONLY: 0: input, "[[INPUT:.+]]", hip, (device-hip, gfx90a)
+// DEVICE-ONLY-NEXT: 1: preprocessor, {0}, hip-cpp-output, (device-hip, gfx90a)
+// DEVICE-ONLY-NEXT: 2: compiler, {1}, ir, (device-hip, gfx90a)
+// DEVICE-ONLY-NEXT: 3: backend, {2}, ir, (device-hip, gfx90a)
+// DEVICE-ONLY-NEXT: 4: offload, "device-hip (amdgcn-amd-amdhsa:gfx90a)" {3}, 
none

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] Correctly omit bundling with the new driver (PR #85842)

2024-03-20 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/85842
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-20 Thread Joseph Huber via cfe-commits


@@ -1160,9 +1152,8 @@ void CGNVCUDARuntime::createOffloadingEntries() {
 
 // Returns module constructor to be added.
 llvm::Function *CGNVCUDARuntime::finalizeModule() {
+  transformManagedVars();

jhuber6 wrote:

I'm guessing we also don't have a test for `__managed__` in the external test 
suite?

https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Enable the _Float16 type for NVPTX compilation (PR #82436)

2024-02-20 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/82436

Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.


>From f2ec6f07168173059c0f316c9746a35ac32efbdf Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 20 Feb 2024 16:43:45 -0600
Subject: [PATCH] [NVPTX] Enable the _Float16 type for NVPTX compilation

Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.
---
 clang/docs/LanguageExtensions.rst | 1 +
 clang/lib/Basic/Targets/NVPTX.cpp | 4 
 clang/test/SemaCUDA/float16.cu| 1 +
 3 files changed, 6 insertions(+)

diff --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index fb4d7a02dd086f..711baf45f449a0 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -833,6 +833,7 @@ to ``float``; see below for more information on this 
emulation.
   * 32-bit ARM (natively on some architecture versions)
   * 64-bit ARM (AArch64) (natively on ARMv8.2a and above)
   * AMDGPU (natively)
+  * NVPTX (natively)
   * SPIR (natively)
   * X86 (if SSE2 is available; natively if AVX512-FP16 is also available)
   * RISC-V (natively if Zfh or Zhinx is available)
diff --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index a8efae3a1ce388..b47c399fef6042 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -61,6 +61,10 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
   NoAsmVariants = true;
   GPU = CudaArch::UNUSED;
 
+  // PTX supports f16 as a fundamental type.
+  HasLegalHalfType = true;
+  HasFloat16 = true;
+
   if (TargetPointerWidth == 32)
 resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
   else if (Opts.NVPTXUseShortPointers)
diff --git a/clang/test/SemaCUDA/float16.cu b/clang/test/SemaCUDA/float16.cu
index a9cbe87f32c100..bb5ed606438491 100644
--- a/clang/test/SemaCUDA/float16.cu
+++ b/clang/test/SemaCUDA/float16.cu
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 -fsyntax-only -triple x86_64 -aux-triple amdgcn -verify %s
+// RUN: %clang_cc1 -fsyntax-only -triple x86_64 -aux-triple nvptx64 -verify %s
 // expected-no-diagnostics
 #include "Inputs/cuda.h"
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NVPTX] Enable the _Float16 type for NVPTX compilation (PR #82436)

2024-02-20 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/82436
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [openmp] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/80460
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From 2cf6f184e2e8a6abc31e0dfb19c706569357597d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  |  76 +---
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  47 ++-
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  35 +-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   2 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   4 +-
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  11 +-
 openmp/libomptarget/CMakeLists.txt|   9 +-
 .../plugins-nextgen/common/CMakeLists.txt |   6 

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From 575390d65fd35729e855823e38dfd28f7a15debd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  14 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  |  76 +---
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  47 ++-
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/docs/gpu/using.rst   |  33 +-
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  35 +-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   2 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   4 +-
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  11 +-
 openmp/libomptarget/CMakeLists.txt|   9 

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/81921

>From c118c0b82cf47b36460479fd920325dedc7a6c79 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 13 Feb 2024 21:08:02 -0600
Subject: [PATCH] [libc] Rework the GPU build to be a regular target

Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly suprerior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Depends on https://github.com/llvm/llvm-project/pull/81557
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  37 +-
 clang/test/Driver/openmp-offload-gpu.c|  20 +-
 libc/CMakeLists.txt   |  20 +-
 .../cmake/modules/LLVMLibCArchitectures.cmake |  28 +-
 libc/cmake/modules/LLVMLibCCheckMPFR.cmake|   2 +-
 .../modules/LLVMLibCCompileOptionRules.cmake  |  76 +---
 libc/cmake/modules/LLVMLibCHeaderRules.cmake  |   2 +-
 libc/cmake/modules/LLVMLibCLibraryRules.cmake | 141 +--
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 348 --
 libc/cmake/modules/LLVMLibCTestRules.cmake|  47 ++-
 .../modules/prepare_libc_gpu_build.cmake  | 108 ++
 libc/docs/gpu/using.rst   |  33 +-
 libc/include/CMakeLists.txt   |   6 +-
 libc/lib/CMakeLists.txt   |  35 +-
 libc/src/__support/File/CMakeLists.txt|   2 +-
 libc/src/__support/GPU/CMakeLists.txt |   2 +-
 libc/src/__support/OSUtil/CMakeLists.txt  |   2 +-
 libc/src/__support/RPC/CMakeLists.txt |   2 +-
 libc/src/math/CMakeLists.txt  |  16 +-
 libc/src/math/gpu/vendor/CMakeLists.txt   |   1 -
 libc/src/stdio/CMakeLists.txt |   2 +-
 libc/src/stdlib/CMakeLists.txt|   4 +-
 libc/src/string/CMakeLists.txt|  12 +-
 libc/startup/gpu/CMakeLists.txt   |  35 +-
 libc/startup/gpu/amdgpu/CMakeLists.txt|  13 -
 libc/startup/gpu/nvptx/CMakeLists.txt |   9 -
 libc/test/CMakeLists.txt  |   6 +-
 libc/test/IntegrationTest/CMakeLists.txt  |  16 -
 libc/test/UnitTest/CMakeLists.txt |   2 +-
 libc/test/src/__support/CMakeLists.txt|  49 +--
 libc/test/src/__support/CPP/CMakeLists.txt|   2 +-
 libc/test/src/__support/File/CMakeLists.txt   |   2 +-
 libc/test/src/errno/CMakeLists.txt|   2 +-
 libc/test/src/math/CMakeLists.txt |  20 +-
 libc/test/src/math/smoke/CMakeLists.txt   |   8 +-
 libc/test/src/stdio/CMakeLists.txt|   2 +-
 libc/test/src/stdlib/CMakeLists.txt   |   6 +-
 libc/test/utils/UnitTest/CMakeLists.txt   |   2 +-
 libc/utils/CMakeLists.txt |   2 +-
 libc/utils/MPFRWrapper/CMakeLists.txt |   2 +-
 libc/utils/gpu/CMakeLists.txt |   4 +-
 libc/utils/gpu/loader/CMakeLists.txt  |  48 ++-
 libc/utils/gpu/loader/amdgpu/CMakeLists.txt   |   6 +-
 libc/utils/gpu/loader/nvptx/CMakeLists.txt|  10 +-
 libc/utils/gpu/server/CMakeLists.txt  |   9 +
 llvm/CMakeLists.txt   |   4 +-
 llvm/cmake/modules/HandleLLVMOptions.cmake|   7 +
 llvm/runtimes/CMakeLists.txt  |  11 +-
 openmp/libomptarget/CMakeLists.txt|   9 

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Append target search paths for direct offloading compilation (PR #82699)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/82699

Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/` for the GPU and the libraries to be in
`lib/`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.

```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc 
/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```


>From 8f0ac133bcea8181e59f82de5767bb9c34e6d346 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 22 Feb 2024 16:13:35 -0600
Subject: [PATCH] [Clang] Append target search paths for direct offloading
 compilation

Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/` for the GPU and the libraries to be in
`lib/`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.

```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc 
/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
---
 clang/lib/Driver/ToolChains/AMDGPU.cpp | 1 +
 clang/lib/Driver/ToolChains/Clang.cpp  | 4 ++--
 clang/lib/Driver/ToolChains/Cuda.cpp   | 4 
 clang/test/Driver/gpu-libc-headers.c   | 9 -
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp 
b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 60e8c123c591d2..6fcbcffd6f0d67 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -625,6 +625,7 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 
   addLinkerCompressDebugSectionsOption(getToolChain(), Args, CmdArgs);
   Args.AddAllArgs(CmdArgs, options::OPT_L);
+  getToolChain().AddFilePathLibArgs(Args, CmdArgs);
   AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
   if (C.getDriver().isUsingLTO())
 addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 7c0409f0c3097a..6e1b7e8657d0dc 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -,8 +,8 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
 C.getActiveOffloadKinds() == Action::OFK_None) {
   SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
   llvm::sys::path::append(P, "include");
-  llvm::sys::path::append(P, "gpu-none-llvm");
-  CmdArgs.push_back("-c-isystem");
+  llvm::sys::path::append(P, getToolChain().getTripleString());
+  CmdArgs.push_back("-internal-isystem");
   CmdArgs.push_back(Args.MakeArgString(P));
 } else if (C.getActiveOffloadKinds() == Action::OFK_OpenMP) {
   // TODO: CUDA / HIP include their own headers for some common functions
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index ed5924c3b73b55..8c7a96289559c1 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
+  // Add standard library search paths passed on the command line.
+  Args.AddAllArgs(CmdArgs, options::OPT_L);
+  getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
   llvm::sys::path::parent_path(TC.getDriver().Dir);
diff --git a/clang/test/Driver/gpu-libc-headers.c 
b/clang/test/Driver/gpu-libc-headers.c
index 74e9a764dfcb35..356a401550399d 100644
--- a/clang/test/Driver/gpu-libc-headers.c
+++ b/clang/test/Driver/gpu-libc-headers.c
@@ -10,10 +10,17 @@
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 
+// RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx90a --sysroot=./ \
+// RUN: -nogpulib %s 2>&1 | FileCheck %s 
--check-prefix=CHECK-HEADERS-AMDGPU
+// RUN:   %clang -### --target=nvptx64-nvidia-cuda -march=sm_89 --sysroot=./ \
+// RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS-NVPTX
+// CHECK-HEADERS-AMDGPU: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}amdgcn-amd-amdhsa"{{.*}}"-isysroot" "./"
+// CHECK-HEADERS-NVPTX: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}nvptx64-nvidia-cuda"{{.*}}"-isysroot" "./"
+
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RUN: -nogpuinc %s 2>&1 | FileCheck %s 
--check-prefix=CHECK-HEADERS-DISABLED
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RU

[clang] [Clang][NVPTX] Allow passing arguments to the linker while standalone (PR #73030)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/73030

>From ee43e8f9ae90bcd70d46b17cfecb854711a4b1ce Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 21 Nov 2023 13:45:10 -0600
Subject: [PATCH] [Clang][NVPTX] Allow passing arguments to the linker while
 standalone

Summary:
We support standalone compilation for the NVPTX architecture using
'nvlink' as our linker. Because of the special handling required to
transform input files to cubins, as nvlink expects for some reason, we
didn't use the standard AddLinkerInput method. However, this also meant
that we weren't forwarding options passed with -Wl to the linker. Add
this support in for the standalone toolchain path.

Revived from https://reviews.llvm.org/D149978
---
 clang/lib/Driver/ToolChains/Cuda.cpp  | 43 +--
 clang/test/Driver/cuda-cross-compiling.c  |  8 
 .../ClangLinkerWrapper.cpp|  4 +-
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index e95ff98e6c940f..5ef8b4455c23f1 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -611,35 +611,34 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
   continue;
 }
 
-// Currently, we only pass the input files to the linker, we do not pass
-// any libraries that may be valid only for the host.
-if (!II.isFilename())
-  continue;
-
 // The 'nvlink' application performs RDC-mode linking when given a '.o'
 // file and device linking when given a '.cubin' file. We always want to
 // perform device linking, so just rename any '.o' files.
 // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-auto InputFile = getToolChain().getInputFilename(II);
-if (llvm::sys::path::extension(InputFile) != ".cubin") {
-  // If there are no actions above this one then this is direct input and 
we
-  // can copy it. Otherwise the input is internal so a `.cubin` file should
-  // exist.
-  if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-const char *CubinF =
-Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-llvm::sys::path::stem(InputFile), "cubin"));
-if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-  continue;
+if (II.isFilename()) {
+  auto InputFile = getToolChain().getInputFilename(II);
+  if (llvm::sys::path::extension(InputFile) != ".cubin") {
+// If there are no actions above this one then this is direct input and
+// we can copy it. Otherwise the input is internal so a `.cubin` file
+// should exist.
+if (II.getAction() && II.getAction()->getInputs().size() == 0) {
+  const char *CubinF =
+  Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
+  llvm::sys::path::stem(InputFile), "cubin"));
+  if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
+continue;
 
-CmdArgs.push_back(CubinF);
+  CmdArgs.push_back(CubinF);
+} else {
+  SmallString<256> Filename(InputFile);
+  llvm::sys::path::replace_extension(Filename, "cubin");
+  CmdArgs.push_back(Args.MakeArgString(Filename));
+}
   } else {
-SmallString<256> Filename(InputFile);
-llvm::sys::path::replace_extension(Filename, "cubin");
-CmdArgs.push_back(Args.MakeArgString(Filename));
+CmdArgs.push_back(Args.MakeArgString(InputFile));
   }
-} else {
-  CmdArgs.push_back(Args.MakeArgString(InputFile));
+} else if (!II.isNothing()) {
+  II.getInputArg().renderAsInput(Args, CmdArgs);
 }
   }
 
diff --git a/clang/test/Driver/cuda-cross-compiling.c 
b/clang/test/Driver/cuda-cross-compiling.c
index 12d0af3b45f32f..5a52496838813e 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -77,3 +77,11 @@
 // RUN:   | FileCheck -check-prefix=LOWERING %s
 
 // LOWERING: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-mllvm" 
"--nvptx-lower-global-ctor-dtor"
+
+//
+// Test passing arguments directly to nvlink.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
+
+// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index bafe8ace60d1ce..03fb0a7d64552e 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -385,9 +385,11 @@ Expected clang(ArrayRef InputFiles, 
const ArgList &Args) {
   Triple.isAMDGPU() ? Args.MakeArgString("-mcpu=" + Arch)
 : Args.MakeArgString("-march=" + Arch),
  

[clang] [Clang][NVPTX] Allow passing arguments to the linker while standalone (PR #73030)

2024-02-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/73030
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] e3cab8f - [LinkerWrapper] Fix test after permitting NVPTX linker arguments

2024-02-22 Thread Joseph Huber via cfe-commits

Author: Joseph Huber
Date: 2024-02-22T16:54:03-06:00
New Revision: e3cab8fe82eb71fadb251d11fec7df9fa0dbdd27

URL: 
https://github.com/llvm/llvm-project/commit/e3cab8fe82eb71fadb251d11fec7df9fa0dbdd27
DIFF: 
https://github.com/llvm/llvm-project/commit/e3cab8fe82eb71fadb251d11fec7df9fa0dbdd27.diff

LOG: [LinkerWrapper] Fix test after permitting NVPTX linker arguments

Summary:
Forgot to change this after a previous patch altered its behaviour.

Added: 


Modified: 
clang/test/Driver/linker-wrapper.c

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index 7fd46778ac9102..83df2b84adefed 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -21,7 +21,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s 
--check-prefix=NVPTX-LINK
 
-// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda 
-march=sm_70 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
+// NVPTX-LINK: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda 
-march=sm_70 -O2 {{.*}}.o {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   
--image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
@@ -30,7 +30,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run 
--device-debug -O0 \
 // RUN:   --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s 
--check-prefix=NVPTX-LINK-DEBUG
 
-// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda 
-march=sm_70 -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o -g 
+// NVPTX-LINK-DEBUG: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda 
-march=sm_70 -O2 {{.*}}.o {{.*}}.o -g 
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] e8740d4 - [Clang] Fix missing architecture on CUDA test

2024-02-22 Thread Joseph Huber via cfe-commits

Author: Joseph Huber
Date: 2024-02-22T16:59:56-06:00
New Revision: e8740d4eb1c88e968b155f73ac745f80b4681589

URL: 
https://github.com/llvm/llvm-project/commit/e8740d4eb1c88e968b155f73ac745f80b4681589
DIFF: 
https://github.com/llvm/llvm-project/commit/e8740d4eb1c88e968b155f73ac745f80b4681589.diff

LOG: [Clang] Fix missing architecture on CUDA test

Summary:
Sorry about the churn here, my local git tree got corrupted so a few
broken tests slipped by while trying to fix it.

Added: 


Modified: 
clang/test/Driver/cuda-cross-compiling.c

Removed: 




diff  --git a/clang/test/Driver/cuda-cross-compiling.c 
b/clang/test/Driver/cuda-cross-compiling.c
index 25058358b63a80..086840accebe7f 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -71,7 +71,7 @@
 //
 // Test passing arguments directly to nvlink.
 //
-// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -### %s 2>&1 \
+// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 
2>&1 \
 // RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
 
 // LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
@@ -87,4 +87,4 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -flto -c %s -### 2>&1 \
 // RUN:   | FileCheck -check-prefix=GENERIC %s
 
-// GENERIC-NOT: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-target-cpu"
\ No newline at end of file
+// GENERIC-NOT: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-target-cpu"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-23 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> @jhuber6 , looks like these changes break the following builds
> 
> * https://lab.llvm.org/buildbot/#/builders/235/builds/5630
> 
> * https://lab.llvm.org/buildbot/#/builders/232/builds/19808
> 
> 
> there are a lot of CMake error messages started with
> 
> ```
> CMake Error at cmake/modules/AddLLVM.cmake:631 (set_target_properties):
>   set_target_properties called with incorrect number of arguments.
> Call Stack (most recent call first):
>   cmake/modules/AddLLVM.cmake:854 (llvm_add_library)
>   lib/Transforms/Hello/CMakeLists.txt:13 (add_llvm_library)
> -- Targeting X86
> -- Targeting NVPTX
> CMake Error at CMakeLists.txt:1213 (add_subdirectory):
>   add_subdirectory given source "/unittest" which is not an existing
>   directory.
> CMake Error at tools/llvm-config/CMakeLists.txt:54 (string):
>   string sub-command REGEX, mode MATCH needs at least 5 arguments total to
>   command.
> ...
> ```
> 
> would you take care of it?

I'll look into it, my guess is that I used `LLVM_DEFAULT_TARGET_TRIPLE` instead 
of one of the runtime ones or something.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-23 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

@vvereschaka Should be fixed now.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Append target search paths for direct offloading compilation (PR #82699)

2024-02-23 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/82699
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

Some nits, mostly just formatting and naming that hasn't been updated.

I agree overall that we should just put this in some canonical form and rely on 
other LLVM passes to take care of things like inlining. Eager to have this 
functionality in, so hopefully we can keep this moving.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-25 Thread Joseph Huber via cfe-commits


@@ -1306,14 +1306,50 @@ float min(float __x, float __y) { return 
__builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
+// Define host min/max functions.
+
 #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
-}
 
-__host__ inline static int max(int __arg1, int __arg2) {
-  return __arg1 > __arg2 ? __arg1 : __arg2;
-}
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(type1, type2) \
+static inline auto min(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference::type { \
+  return (__a < __b) ? __a : __b; \
+} \
+static inline auto max(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference __b ? __a : __b)>::type { \
+  return (__a > __b) ? __a : __b; \
+}
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int)

jhuber6 wrote:

Could we not do something like this w/ the appropriate static assertion? Or is 
there an important restriction on the specific types for this function.
```c
template 
static inline auto min(const T &__a, const U &__b) {
  return (__a < __b) ? __a : __b;
}
```

https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Joseph Huber via cfe-commits


@@ -1306,14 +1306,50 @@ float min(float __x, float __y) { return 
__builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
+// Define host min/max functions.
+
 #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
-}
 
-__host__ inline static int max(int __arg1, int __arg2) {
-  return __arg1 > __arg2 ? __arg1 : __arg2;
-}
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(type1, type2) \
+static inline auto min(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference::type { \
+  return (__a < __b) ? __a : __b; \
+} \
+static inline auto max(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference __b ? __a : __b)>::type { \
+  return (__a > __b) ? __a : __b; \
+}
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int)

jhuber6 wrote:

Could we not do stuff like `static_assert` where we check if both types are the 
same modulo `unsigned` or `const`? I'm assuming that would be similar. I'd just 
prefer to avoid macro magic if possible, but it we really need it for 
compatibility reasons we can use it.

https://github.com/llvm/llvm-project/pull/82956
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-02-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/83282

Summary:
One recurring problem we have with the OpenMP libraries is that they are
potentially conflicting with ones found on the system, this occurs when
there are two copies and one is used for linking that it not attached to
the correspoding clang compiler. LLVM already uses target specific
directories for this, like with libc++, which are always searched first.
This patch changes the install directory to be
`lib/x86_64-unknown-linux-gnu` for example.

Notable changes would be that users will need to change their
LD_LIBRARY_PATH settings optionally, or use default rt-rpath options.
This should fix problems were users are linking the wrong versions of
static libraries


>From 13389e98533eb287514bcbca6a4333e887a8a514 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 28 Feb 2024 09:57:30 -0600
Subject: [PATCH] [OpenMP] Respect LLVM per-target install directories

Summary:
One recurring problem we have with the OpenMP libraries is that they are
potentially conflicting with ones found on the system, this occurs when
there are two copies and one is used for linking that it not attached to
the correspoding clang compiler. LLVM already uses target specific
directories for this, like with libc++, which are always searched first.
This patch changes the install directory to be
`lib/x86_64-unknown-linux-gnu` for example.

Notable changes would be that users will need to change their
LD_LIBRARY_PATH settings optionally, or use default rt-rpath options.
This should fix problems were users are linking the wrong versions of
static libraries
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp | 10 +-
 clang/lib/Driver/ToolChains/CommonArgs.h   |  3 ++-
 clang/lib/Driver/ToolChains/Cuda.cpp   |  2 +-
 openmp/CMakeLists.txt  | 12 +---
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index faceee85a2f8dc..382c8b3612a0af 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2763,13 +2763,13 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,
StringRef BitcodeSuffix,
-   const llvm::Triple &Triple) {
+   const llvm::Triple &Triple,
+   const ToolChain &HostTC) {
   SmallVector LibraryPaths;
 
-  // Add path to clang lib / lib64 folder.
-  SmallString<256> DefaultLibPath = llvm::sys::path::parent_path(D.Dir);
-  llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
-  LibraryPaths.emplace_back(DefaultLibPath.c_str());
+  // Check all of the standard library search paths used by the compiler.
+  for (const auto &LibPath : HostTC.getFilePaths())
+LibraryPaths.emplace_back(LibPath);
 
   // Add user defined library paths from LIBRARY_PATH.
   std::optional LibPath =
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.h 
b/clang/lib/Driver/ToolChains/CommonArgs.h
index 2db0f889ca8209..b8f649aab4bdd2 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.h
+++ b/clang/lib/Driver/ToolChains/CommonArgs.h
@@ -214,7 +214,8 @@ void addMachineOutlinerArgs(const Driver &D, const 
llvm::opt::ArgList &Args,
 
 void addOpenMPDeviceRTL(const Driver &D, const llvm::opt::ArgList &DriverArgs,
 llvm::opt::ArgStringList &CC1Args,
-StringRef BitcodeSuffix, const llvm::Triple &Triple);
+StringRef BitcodeSuffix, const llvm::Triple &Triple,
+const ToolChain &HostTC);
 
 void addOutlineAtomicsArgs(const Driver &D, const ToolChain &TC,
const llvm::opt::ArgList &Args,
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index ff3687ca7dae33..177fd6310e7ee2 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -903,7 +903,7 @@ void CudaToolChain::addClangTargetOptions(
   return;
 
 addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, GpuArch.str(),
-   getTriple());
+   getTriple(), HostTC);
   }
 }
 
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index 03068af22629f7..3c4ff76ad6d161 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -46,9 +46,15 @@ if (OPENMP_STANDALONE_BUILD)
   set(CMAKE_CXX_EXTENSIONS NO)
 else()
   set(OPENMP_ENABLE_WERROR ${LLVM_ENABLE_WERROR})
-  # If building in tree, we honor the same install suffix LLVM uses.
-  set(OPENMP_INSTALL_LIBDIR "lib${LLVM_LIBDIR_SUFFIX}" CACHE STRING
-  "Path where built OpenMP libraries should be installed.")
+
+  # When building in tree we install the runtime according to t

[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)

2024-02-28 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

So, I'm wondering if we could do a clang configuration file based solution for 
this. The problem that I see now is that we'd like to make some clang 
configuration files only active for a certain language. I think we already have 
OS specific files and target specific files, so it might be possible to have 
language specific ones as well?

Also, we need to be able to do something like `-rpath 
../lib/x86_64-unknown-linux-gnu` but without hardcoding the triple. Is 
that a way to get the LLVM default triple in these things? What do you think 
@MaskRay.

https://github.com/llvm/llvm-project/pull/82004
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [HIP] Support compressing bundle by LZMA (PR #83297)

2024-02-28 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

This seems to be adding an entirely new compression scheme to LLVM. I feel like 
that should be a separate patch and the part where we make HIP use it is a 
follow-up.

https://github.com/llvm/llvm-project/pull/83297
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [HIP] Support compressing bundle by LZMA (PR #83306)

2024-02-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/83306
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-02-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Hi @jhuber6, @MaskRay
> 
> We are having some problems with this patch on a server where the file 
> /lib64/libomptarget-nvptx-sm_52.bc exists. The test case that fails is 
> clang/test/Driver/openmp-offload-gpu.c.
> 
> **Problem 1** I think one problem is related to this check line `// 
> CHK-ENV-BCLIB: 
> clang{{.*}}-triple{{.*}}nvptx64-nvidia-cuda{{.*}}-mlink-builtin-bitcode{{.*}}subdir{{/|}}libomptarget-nvptx-sm_52.bc
>  ` That test is using `env LIBRARY_PATH` but your changes in 
> `tools::addOpenMPDeviceRTL` makes it prioritize the standard library paths 
> before the environment. Not sure if that is how it should be or if env should 
> have higher prio (i.e. added to LibraryPaths before the paths found in 
> HostTC).
> 
> **Problem 2** This check line also started failing: `// CHK-BCLIB-WARN: no 
> library 'libomptarget-nvptx-sm_52.bc' found in the default clang lib 
> directory or in LIBRARY_PATH; use '--libomptarget-nvptx-bc-path' to specify 
> nvptx bitcode library `
> 
> Now, with your path, I guess it starts picking up the 
> `/lib64/libomptarget-nvptx-sm_52.bc` file from the system. So we no longer 
> get the warning. Is that the intention with your patch? Regardless, I think 
> you need to do something with that test case because I think the "should 
> never exist" part in
> 
> ```
> /// Check that the warning is thrown when the libomptarget bitcode library is 
> not found.
> /// Libomptarget requires sm_52 or newer so an sm_52 bitcode library should 
> never exist.
> ```
> 
> no longer holds with your patch.

I think it's standard to prioritize library path stuff. Does this work if you 
just flip the order we fill the library search path? I think the behavioral 
change here is that we didn't used to look in the system directory.

https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

This was the original behavior of my patch, but I reverted it because it broke 
all the HIP headers that were unintentionally relying on this. Has that been 
resolved?

https://github.com/llvm/llvm-project/pull/83558
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Problem 1 can be solved by flipping the order. But Problem 2 would remain as 
> it doesn't depend on the order.

Honestly, we should just remove the second test. We just treat these things as 
libraries and it doesn't make sense for a test to ensure that `-lstdc++` 
doesn't exist on the user's system or whatever.

https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/83573

Summary:
We still use this bitcode library in one case, the NVPTX non-LTO build.
The patch updated the search paths to treat it the same as other
libraries, which unintentionally prioritized system paths over
LIBRARY_PATH which is generally not correct. Also we had a test that
relied on system state so remove that.


>From 210bee4de5022ed7f4f7357c921ec291677ea7f3 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 1 Mar 2024 08:11:37 -0600
Subject: [PATCH] [OpenMP] Fix test after updating library search paths

Summary:
We still use this bitcode library in one case, the NVPTX non-LTO build.
The patch updated the search paths to treat it the same as other
libraries, which unintentionally prioritized system paths over
LIBRARY_PATH which is generally not correct. Also we had a test that
relied on system state so remove that.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp |  8 
 clang/test/Driver/openmp-offload-gpu.c | 11 ---
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 382c8b3612a0af..7f0f78b41e79ed 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2767,10 +2767,6 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
const ToolChain &HostTC) {
   SmallVector LibraryPaths;
 
-  // Check all of the standard library search paths used by the compiler.
-  for (const auto &LibPath : HostTC.getFilePaths())
-LibraryPaths.emplace_back(LibPath);
-
   // Add user defined library paths from LIBRARY_PATH.
   std::optional LibPath =
   llvm::sys::Process::GetEnv("LIBRARY_PATH");
@@ -2782,6 +2778,10 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
   LibraryPaths.emplace_back(Path.trim());
   }
 
+  // Check all of the standard library search paths used by the compiler.
+  for (const auto &LibPath : HostTC.getFilePaths())
+LibraryPaths.emplace_back(LibPath);
+
   OptSpecifier LibomptargetBCPathOpt =
   Triple.isAMDGCN() ? options::OPT_libomptarget_amdgpu_bc_path_EQ
 : options::OPT_libomptarget_nvptx_bc_path_EQ;
diff --git a/clang/test/Driver/openmp-offload-gpu.c 
b/clang/test/Driver/openmp-offload-gpu.c
index 5da74a35d87ad9..f7b06c9ec59580 100644
--- a/clang/test/Driver/openmp-offload-gpu.c
+++ b/clang/test/Driver/openmp-offload-gpu.c
@@ -101,17 +101,6 @@
 
 /// ###
 
-/// Check that the warning is thrown when the libomptarget bitcode library is 
not found.
-/// Libomptarget requires sm_52 or newer so an sm_52 bitcode library should 
never exist.
-// RUN:   not %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
-// RUN:   -Xopenmp-target -march=sm_52 
--cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \
-// RUN:   -fopenmp-relocatable-target -save-temps %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=CHK-BCLIB-WARN %s
-
-// CHK-BCLIB-WARN: no library 'libomptarget-nvptx-sm_52.bc' found in the 
default clang lib directory or in LIBRARY_PATH; use 
'--libomptarget-nvptx-bc-path' to specify nvptx bitcode library
-
-/// ###
-
 /// Check that the error is thrown when the libomptarget bitcode library does 
not exist.
 // RUN:   not %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
 // RUN:   -Xopenmp-target -march=sm_52 
--cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Problem 1 can be solved by flipping the order. But Problem 2 would remain as 
> it doesn't depend on the order.

https://github.com/llvm/llvm-project/pull/83573 I made a patch to fix it.

https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits


@@ -101,17 +101,6 @@
 
 /// ###
 
-/// Check that the warning is thrown when the libomptarget bitcode library is 
not found.
-/// Libomptarget requires sm_52 or newer so an sm_52 bitcode library should 
never exist.
-// RUN:   not %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
-// RUN:   -Xopenmp-target -march=sm_52 
--cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \
-// RUN:   -fopenmp-relocatable-target -save-temps %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=CHK-BCLIB-WARN %s
-
-// CHK-BCLIB-WARN: no library 'libomptarget-nvptx-sm_52.bc' found in the 
default clang lib directory or in LIBRARY_PATH; use 
'--libomptarget-nvptx-bc-path' to specify nvptx bitcode library

jhuber6 wrote:

I'm not overly concerned, we could change it to be a `no_such_file` error 
instead. Realistically this was just a mess because whoever set it up initially 
didn't put it into a place that's guaranteed to be looked at first, so instead 
they just forced it to only look in once place.

https://github.com/llvm/llvm-project/pull/83573
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/83573
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> ```
> yeluo@epyc-server:/soft/llvm/main-20240301/lib$ ls libomp* -l
> lrwxrwxrwx 1 yeluo yeluo   34 Mar  1 11:18 libomptarget.rtl.amdgpu.so -> 
> libomptarget.rtl.amdgpu.so.19.0git
> -r--r--r-- 1 yeluo yeluo 67532024 Mar  1 11:04 
> libomptarget.rtl.amdgpu.so.19.0git
> lrwxrwxrwx 1 yeluo yeluo   32 Mar  1 11:18 libomptarget.rtl.cuda.so -> 
> libomptarget.rtl.cuda.so.19.0git
> -r--r--r-- 1 yeluo yeluo 67440504 Mar  1 11:04 
> libomptarget.rtl.cuda.so.19.0git
> lrwxrwxrwx 1 yeluo yeluo   34 Mar  1 11:18 libomptarget.rtl.x86_64.so -> 
> libomptarget.rtl.x86_64.so.19.0git
> -r--r--r-- 1 yeluo yeluo 67349472 Mar  1 11:04 
> libomptarget.rtl.x86_64.so.19.0git
> lrwxrwxrwx 1 yeluo yeluo   23 Mar  1 11:18 libomptarget.so -> 
> libomptarget.so.19.0git
> -r--r--r-- 1 yeluo yeluo  2686592 Mar  1 11:04 libomptarget.so.19.0git
> ```
> 
> Should libomptarget follow the relocation of libomp?

That's surprising. Maybe this is interacting incorrectly with the fact that we 
build these as LLVM libs?

https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> It seems being installed twice both under `lib` and 
> `lib/x86_64-unknown-linux-gnu`. files are the identical as diff show nothing.

Makes sense, like `add_llvm_library` is implicitly installing it there, then 
our subsequent `install` call is doing it again. I wonder if there's a way to 
change that.

https://github.com/llvm/llvm-project/pull/83282
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make '-frtlib-add-rpath' include the standard library directory (PR #86217)

2024-03-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/86217

Summary:
The original intention of the `openmp-add-rpath` option was to add the
rpath to the language runtime directory. However, the current
implementation only adds it to the compiler's resource directory. This
patch adds support for appending the `-rpath` to the compiler's standard
library directory as well. Currently this is `/../lib/`.


>From 6bd078cb080903127a8dfa27fb978350e94b8375 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 21 Mar 2024 18:25:35 -0500
Subject: [PATCH] [Clang] Make '-frtlib-add-rpath' include the standard library
 directory

Summary:
The original intention of the `openmp-add-rpath` option was to add the
rpath to the language runtime directory. However, the current
implementation only adds it to the compiler's resource directory. This
patch adds support for appending the `-rpath` to the compiler's standard
library directory as well. Currently this is `/../lib/`.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp |  6 +-
 clang/test/Driver/arch-specific-libdir-rpath.c | 13 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 4478865313636d..6b1fbba7abd031 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1142,7 +1142,11 @@ void tools::addArchSpecificRPath(const ToolChain &TC, 
const ArgList &Args,
 options::OPT_fno_rtlib_add_rpath, false))
 return;
 
-  for (const auto &CandidateRPath : TC.getArchSpecificLibPaths()) {
+  SmallVector CandidateRPaths(TC.getArchSpecificLibPaths());
+  if (const auto CandidateRPath = TC.getStdlibPath())
+CandidateRPaths.emplace_back(*CandidateRPath);
+
+  for (const auto &CandidateRPath : CandidateRPaths) {
 if (TC.getVFS().exists(CandidateRPath)) {
   CmdArgs.push_back("-rpath");
   CmdArgs.push_back(Args.MakeArgString(CandidateRPath));
diff --git a/clang/test/Driver/arch-specific-libdir-rpath.c 
b/clang/test/Driver/arch-specific-libdir-rpath.c
index 1e6bbbc5929ac2..e95fb21c0a5fb1 100644
--- a/clang/test/Driver/arch-specific-libdir-rpath.c
+++ b/clang/test/Driver/arch-specific-libdir-rpath.c
@@ -84,6 +84,15 @@
 // RUN: -frtlib-add-rpath \
 // RUN:   | FileCheck --check-prefixes=PERTARGET %s
 
+// Test that the driver adds an per-target arch-specific subdirectory to the
+// stdlib path.
+//
+// RUN: %clang %s -### 2>&1 --target=x86_64-linux-gnu \
+// RUN: -fsanitize=address -shared-libasan \
+// RUN: -resource-dir=%S/Inputs/resource_dir_with_per_target_subdir \
+// RUN: -frtlib-add-rpath \
+// RUN:   | FileCheck --check-prefixes=STDLIB %s
+
 // RESDIR: "-resource-dir" "[[RESDIR:[^"]*]]"
 //
 // LIBPATH-X86_64: -L[[RESDIR]]{{(/|)lib(/|)linux(/|)x86_64}}
@@ -101,3 +110,7 @@
 // PERTARGET: "-resource-dir" "[[PTRESDIR:[^"]*]]"
 // PERTARGET: -L[[PTRESDIR]]{{(/|)lib(/|)x86_64-unknown-linux-gnu}}
 // PERTARGET:   "-rpath" 
"[[PTRESDIR]]{{(/|)lib(/|)x86_64-unknown-linux-gnu}}"
+
+// STDLIB: InstalledDir: [[LIBDIR:.+$]]
+// STDLIB: -L[[LIBDIR]]/..{{(/|)lib(/|)x86_64-unknown-linux-gnu}}
+// STDLIB:   "-rpath" 
"[[LIBDIR]]/..{{(/|)lib(/|)x86_64-unknown-linux-gnu}}"

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make '-frtlib-add-rpath' include the standard library directory (PR #86217)

2024-03-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/86217

>From 722b8b454d652b3d52e20b9bacff58e096cc7feb Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 21 Mar 2024 18:25:35 -0500
Subject: [PATCH] [Clang] Make '-frtlib-add-rpath' include the standard library
 directory

Summary:
The original intention of the `openmp-add-rpath` option was to add the
rpath to the language runtime directory. However, the current
implementation only adds it to the compiler's resource directory. This
patch adds support for appending the `-rpath` to the compiler's standard
library directory as well. Currently this is `/../lib/`.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 4478865313636d..6b1fbba7abd031 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1142,7 +1142,11 @@ void tools::addArchSpecificRPath(const ToolChain &TC, 
const ArgList &Args,
 options::OPT_fno_rtlib_add_rpath, false))
 return;
 
-  for (const auto &CandidateRPath : TC.getArchSpecificLibPaths()) {
+  SmallVector CandidateRPaths(TC.getArchSpecificLibPaths());
+  if (const auto CandidateRPath = TC.getStdlibPath())
+CandidateRPaths.emplace_back(*CandidateRPath);
+
+  for (const auto &CandidateRPath : CandidateRPaths) {
 if (TC.getVFS().exists(CandidateRPath)) {
   CmdArgs.push_back("-rpath");
   CmdArgs.push_back(Args.MakeArgString(CandidateRPath));

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make '-frtlib-add-rpath' include the standard library directory (PR #86217)

2024-03-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/86217
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HIP][NFC] Refactor managed var codegen (PR #85976)

2024-03-22 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/85976
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] NFC Silence compiler warning spam (PR #86532)

2024-03-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/86532
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CMake] Change GCC_INSTALL_PREFIX from warning to fatal error (PR #85891)

2024-03-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

What's the suggested way to handle this in a standard build? The documents 
state to use `-DCMAKE_CXX_FLAGS=--gcc-isntall-dir=` but that doesn't work 
if you build LLVM with `gcc` initially. You'd need to somehow only pass that 
flag to the invocations that use `clang`. We also have some code in the OpenMP 
project that doesn't respect `CMAKE_CXX_FLAGS` because it needs to be very 
carefully controlled to generate valid GPU code. I don't think what's suggested 
anywhere is a valid replacement for the old behavior and is a regression for 
anyone unfortunate enough to need to build this on a cluster machine where they 
don't control the global GCC installation.

https://github.com/llvm/llvm-project/pull/85891
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CMake] Change GCC_INSTALL_PREFIX from warning to fatal error (PR #85891)

2024-03-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

@petrhosek Is there a way to pass flags only to the runtimes portion of the 
build within the normal workflow? I know we have 
`-DRUNTIMES_x86_64-unknown-linux-gnu_CMAKE_CXX_COMPILE_FLAGS=` that might work, 
but I don't think this is respected if we're using the `default` target.

https://github.com/llvm/llvm-project/pull/85891
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CMake] Change GCC_INSTALL_PREFIX from warning to fatal error (PR #85891)

2024-03-26 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> `-DRUNTIMES_CMAKE_ARGS="-DCMAKE_C_FLAGS=--gcc-install-dir=$GCC_ROOT;-DCMAKE_CXX_FLAGS=--gcc-install-dir=$GCC_ROOT"`
>  worked for me

Great, we should probably document this somewhere.

https://github.com/llvm/llvm-project/pull/85891
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/86830

Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.

Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.


>From 1583db25c7a24e88fc3439820538ff2ef8e24429 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 11:43:37 -0500
Subject: [PATCH] [Offload] Change unregister library to use `atexit` instead
 of destructor

Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.

Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.
---
 clang/test/Driver/linker-wrapper-image.c  |  8 +--
 .../Frontend/Offloading/OffloadWrapper.cpp| 68 ++-
 2 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 75475264135224..5d5d62805e174d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -26,12 +26,12 @@
 //  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 
8
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
-// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
-// OPENMP-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]
+// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 101, ptr @.omp_offloading.descriptor_reg, ptr null }]
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
 
@@ -62,7 +62,7 @@
 // CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", 
align 8
 // CUDA-NEXT: @.cuda.binary_handle = internal global ptr null
 
-// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]
+// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 101, ptr @.cuda.fatbin_reg, ptr null }]
 
 //  CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" 
{
 // CUDA-NEXT: entry:
@@ -162,7 +162,7 @@
 // HIP-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1212764230, i32 1, ptr @.fat

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits


@@ -186,57 +186,60 @@ GlobalVariable *createBinDesc(Module &M, 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *createUnregisterFunction(Module &M, GlobalVariable *BinDesc,

jhuber6 wrote:

No, since I need to call this function from `createRegisterFunction` now. I 
could forward declare it but I don't think there's a point given it's inside an 
anonymous namespace.

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

So, looking into `libomptarget` when this is applied is doing something weird. 
`atexit` is functionally a stack, so that means the first in is the last out. 
However, it seems that the static global constructor created inside of the CUDA 
plugin is being unregistered *after* this one for unknown reasons. Will need to 
look into that.

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/86830

>From 875ed36029851a2423c97b28bd5bf34975efb016 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 11:43:37 -0500
Subject: [PATCH] [Offload] Change unregister library to use `atexit` instead
 of destructor

Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.

Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.
---
 clang/test/Driver/linker-wrapper-image.c  |  8 +--
 .../Frontend/Offloading/OffloadWrapper.cpp| 70 ++-
 2 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 75475264135224..d01445e3aed04e 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -26,11 +26,11 @@
 //  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 
8
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
-// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
-// OPENMP-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]
+// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 101, ptr @.omp_offloading.descriptor_reg, ptr null }]
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
@@ -62,7 +62,7 @@
 // CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", 
align 8
 // CUDA-NEXT: @.cuda.binary_handle = internal global ptr null
 
-// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]
+// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 101, ptr @.cuda.fatbin_reg, ptr null }]
 
 //  CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" 
{
 // CUDA-NEXT: entry:
@@ -162,7 +162,7 @@
 // HIP-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1212764230, i32 1, ptr @.fatbin_image, ptr null }, section ".hipFatBinSegment", 
align 8
 // HIP-NEXT: @.hip.binary_handle = internal global ptr null
 
-// HIP: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 1, ptr @.hip.fatbin_reg, ptr null }]
+// HIP: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 101, ptr @.hip.fatbin_reg, ptr null }]
 
 //  HIP: define internal void @.hip.fatbin_reg() section ".text.startup" {
 // HIP-NEXT: entry:
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index fec1bdbe9d8c74..86e8712ce95ae6 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M, Gl

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Fixed, I neglected the fact that OpenMP registers more destructors inside of 
the constructor itself. Passes all the tests now.

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits


@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
+   StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-".omp_offloading.descriptor_reg" + Suffix, &M);
+  auto *Func =
+  Function::Create(FuncTy, GlobalValue::InternalLinkage,
+   ".omp_offloading.descriptor_unreg" + Suffix, &M);
   Func->setSection(".text.startup");
 
-  // Get __tgt_register_lib function declaration.
-  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-  /*isVarArg*/ false);
-  FunctionCallee RegFuncC =
-  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+  // Get __tgt_unregister_lib function declaration.
+  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+/*isVarArg*/ false);
+  FunctionCallee UnRegFuncC =
+  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(RegFuncC, BinDesc);
+  Builder.CreateCall(UnRegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
-  // Add this function to constructors.
-  // Set priority to 1 so that __tgt_register_lib is executed AFTER
-  // __tgt_register_requires (we want to know what requirements have been
-  // asked for before we load a libomptarget plugin so that by the time the
-  // plugin is loaded it can report how many devices there are which can
-  // satisfy these requirements).
-  appendToGlobalCtors(M, Func, /*Priority*/ 1);
+  return Func;
 }
 
-void createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
-  StringRef Suffix) {
+void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
+StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func =
-  Function::Create(FuncTy, GlobalValue::InternalLinkage,
-   ".omp_offloading.descriptor_unreg" + Suffix, &M);
+  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
+".omp_offloading.descriptor_reg" + Suffix, &M);
   Func->setSection(".text.startup");
 
-  // Get __tgt_unregister_lib function declaration.
-  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-/*isVarArg*/ false);
-  FunctionCallee UnRegFuncC =
-  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
+  // Get __tgt_register_lib function declaration.
+  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+  /*isVarArg*/ false);
+  FunctionCallee RegFuncC =
+  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+
+  auto *AtExitTy = FunctionType::get(
+  Type::getInt32Ty(C), PointerType::getUnqual(C), /*isVarArg=*/false);
+  FunctionCallee AtExit = M.getOrInsertFunction("atexit", AtExitTy);
+
+  Function *UnregFunc = createUnregisterFunction(M, BinDesc, Suffix);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(UnRegFuncC, BinDesc);
+
+  // Register the destructors with 'atexit', This is expected by the CUDA

jhuber6 wrote:

I think that's actually in this file somewhere for the CUDA wrapper portion.

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits


@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
-StringRef Suffix) {
+Function *createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
+   StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-".omp_offloading.descriptor_reg" + Suffix, &M);
+  auto *Func =
+  Function::Create(FuncTy, GlobalValue::InternalLinkage,
+   ".omp_offloading.descriptor_unreg" + Suffix, &M);
   Func->setSection(".text.startup");
 
-  // Get __tgt_register_lib function declaration.
-  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-  /*isVarArg*/ false);
-  FunctionCallee RegFuncC =
-  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+  // Get __tgt_unregister_lib function declaration.
+  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+/*isVarArg*/ false);
+  FunctionCallee UnRegFuncC =
+  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(RegFuncC, BinDesc);
+  Builder.CreateCall(UnRegFuncC, BinDesc);
   Builder.CreateRetVoid();
 
-  // Add this function to constructors.
-  // Set priority to 1 so that __tgt_register_lib is executed AFTER
-  // __tgt_register_requires (we want to know what requirements have been
-  // asked for before we load a libomptarget plugin so that by the time the
-  // plugin is loaded it can report how many devices there are which can
-  // satisfy these requirements).
-  appendToGlobalCtors(M, Func, /*Priority*/ 1);
+  return Func;
 }
 
-void createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
-  StringRef Suffix) {
+void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
+StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func =
-  Function::Create(FuncTy, GlobalValue::InternalLinkage,
-   ".omp_offloading.descriptor_unreg" + Suffix, &M);
+  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
+".omp_offloading.descriptor_reg" + Suffix, &M);
   Func->setSection(".text.startup");
 
-  // Get __tgt_unregister_lib function declaration.
-  auto *UnRegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
-/*isVarArg*/ false);
-  FunctionCallee UnRegFuncC =
-  M.getOrInsertFunction("__tgt_unregister_lib", UnRegFuncTy);
+  // Get __tgt_register_lib function declaration.
+  auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(M),
+  /*isVarArg*/ false);
+  FunctionCallee RegFuncC =
+  M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);
+
+  auto *AtExitTy = FunctionType::get(
+  Type::getInt32Ty(C), PointerType::getUnqual(C), /*isVarArg=*/false);
+  FunctionCallee AtExit = M.getOrInsertFunction("atexit", AtExitTy);
+
+  Function *UnregFunc = createUnregisterFunction(M, BinDesc, Suffix);
 
   // Construct function body
   IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));
-  Builder.CreateCall(UnRegFuncC, BinDesc);
+
+  // Register the destructors with 'atexit', This is expected by the CUDA

jhuber6 wrote:

```suggestion
  // Register the destructors with 'atexit'. This is expected by the CUDA
```

https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/86830

>From 875ed36029851a2423c97b28bd5bf34975efb016 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 11:43:37 -0500
Subject: [PATCH 1/2] [Offload] Change unregister library to use `atexit`
 instead of destructor

Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.

Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.
---
 clang/test/Driver/linker-wrapper-image.c  |  8 +--
 .../Frontend/Offloading/OffloadWrapper.cpp| 70 ++-
 2 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 75475264135224..d01445e3aed04e 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -26,11 +26,11 @@
 //  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 
8
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
-// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
-// OPENMP-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]
+// OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 101, ptr @.omp_offloading.descriptor_reg, ptr null }]
 
 //  OPENMP: define internal void @.omp_offloading.descriptor_reg() section 
".text.startup" {
 // OPENMP-NEXT: entry:
+// OPENMP-NEXT:   %0 = call i32 @atexit(ptr @.omp_offloading.descriptor_unreg)
 // OPENMP-NEXT:   call void @__tgt_register_lib(ptr 
@.omp_offloading.descriptor)
 // OPENMP-NEXT:   ret void
 // OPENMP-NEXT: }
@@ -62,7 +62,7 @@
 // CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", 
align 8
 // CUDA-NEXT: @.cuda.binary_handle = internal global ptr null
 
-// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]
+// CUDA: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 101, ptr @.cuda.fatbin_reg, ptr null }]
 
 //  CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" 
{
 // CUDA-NEXT: entry:
@@ -162,7 +162,7 @@
 // HIP-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1212764230, i32 1, ptr @.fatbin_image, ptr null }, section ".hipFatBinSegment", 
align 8
 // HIP-NEXT: @.hip.binary_handle = internal global ptr null
 
-// HIP: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 1, ptr @.hip.fatbin_reg, ptr null }]
+// HIP: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, 
ptr, ptr } { i32 101, ptr @.hip.fatbin_reg, ptr null }]
 
 //  HIP: define internal void @.hip.fatbin_reg() section ".text.startup" {
 // HIP-NEXT: entry:
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp 
b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index fec1bdbe9d8c74..86e8712ce95ae6 100644
--- a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -186,57 +186,62 @@ GlobalVariable *createBinDesc(Module &M, 
ArrayRef> Bufs,
 ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M

[clang] [llvm] [Offload] Change unregister library to use `atexit` instead of destructor (PR #86830)

2024-03-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/86830
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/87009

This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971 
https://github.com/llvm/llvm-project/pull/86875 
https://github.com/llvm/llvm-project/pull/86868


>From fe0b6725e9aa89cc378ffd97f19354d59ab4fa93 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 15:27:16 -0500
Subject: [PATCH 1/4] [Libomptarget] Rename `libomptarget.rtl.x86_64` to
 `libomptarget.rtl.host`

Summary:
All of these are functionally the same code, just compiled for separate
architectures. We currently do not expose a way to execute these on
separate architectures as the host plugin works using `dlopen` into the
same process, and therefore cannot possibly be an incompatible
architecture. (This could work with a remote plugin, but this is not
supported yet).

This patch simply renames all of these to the same thing so we no longer
need to check around for its varying definitions.
---
 .../plugins-nextgen/host/CMakeLists.txt   | 36 +--
 openmp/libomptarget/src/CMakeLists.txt|  5 +--
 2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
index ccbf7d033fd663..0954f8367654f6 100644
--- a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
@@ -14,36 +14,36 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$")
 endif()
 
 # Create the library and add the default arguments.
-add_target_library(omptarget.rtl.${machine} ${machine})
+add_target_library(omptarget.rtl.host ${machine})
 
-target_sources(omptarget.rtl.${machine} PRIVATE src/rtl.cpp)
+target_sources(omptarget.rtl.host PRIVATE src/rtl.cpp)
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
   libomptarget_say("Building ${machine} plugin linked with libffi")
   if(FFI_STATIC_LIBRARIES)
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi_static)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi_static)
   else()
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi)
   endif()
 else()
   libomptarget_say("Building ${machine} plugin for dlopened libffi")
-  target_sources(omptarget.rtl.${machine} PRIVATE dynamic_ffi/ffi.cpp)
-  target_include_directories(omptarget.rtl.${machine} PRIVATE dynamic_ffi)
+  target_sources(omptarget.rtl.host PRIVATE dynamic_ffi/ffi.cpp)
+  target_include_directories(omptarget.rtl.host PRIVATE dynamic_ffi)
 endif()
 
 # Install plugin under the lib destination folder.
-install(TARGETS omptarget.rtl.${machine}
+install(TARGETS omptarget.rtl.host
 LIBRARY DESTINATION "${OPENMP_INSTALL_LIBDIR}")
-set_target_properties(omptarget.rtl.${machine} PROPERTIES
+set_target_properties(omptarget.rtl.host PROPERTIES
   INSTALL_RPATH "$ORIGIN" BUILD_RPATH "$ORIGIN:${CMAKE_CURRENT_BINARY_DIR}/.."
   POSITION_INDEPENDENT_CODE ON
   CXX_VISIBILITY_PRESET protected)
 
-target_include_directories(omptarget.rtl.${machine} PRIVATE
+target_include_directories(omptarget.rtl.host PRIVATE
${LIBOMPTARGET_INCLUDE_DIR})
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
-  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.${machine})
+  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.host)
   set(LIBOMPTARGET_TESTED_PLUGINS
   "${LIBOMPTARGET_TESTED_PLUGINS}" PARENT_SCOPE)
 else()
@@ -53,29 +53,29 @@ endif()
 # Define the target specific triples and ELF machine values.
 if(CMAKE_SYSTEM_PROCESSOR

[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-28 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

This contains three other dependent commits until they land. For now just 
browse the most recent commit for the relevant changes.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From fe0b6725e9aa89cc378ffd97f19354d59ab4fa93 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 15:27:16 -0500
Subject: [PATCH 1/4] [Libomptarget] Rename `libomptarget.rtl.x86_64` to
 `libomptarget.rtl.host`

Summary:
All of these are functionally the same code, just compiled for separate
architectures. We currently do not expose a way to execute these on
separate architectures as the host plugin works using `dlopen` into the
same process, and therefore cannot possibly be an incompatible
architecture. (This could work with a remote plugin, but this is not
supported yet).

This patch simply renames all of these to the same thing so we no longer
need to check around for its varying definitions.
---
 .../plugins-nextgen/host/CMakeLists.txt   | 36 +--
 openmp/libomptarget/src/CMakeLists.txt|  5 +--
 2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
index ccbf7d033fd663..0954f8367654f6 100644
--- a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
@@ -14,36 +14,36 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$")
 endif()
 
 # Create the library and add the default arguments.
-add_target_library(omptarget.rtl.${machine} ${machine})
+add_target_library(omptarget.rtl.host ${machine})
 
-target_sources(omptarget.rtl.${machine} PRIVATE src/rtl.cpp)
+target_sources(omptarget.rtl.host PRIVATE src/rtl.cpp)
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
   libomptarget_say("Building ${machine} plugin linked with libffi")
   if(FFI_STATIC_LIBRARIES)
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi_static)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi_static)
   else()
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi)
   endif()
 else()
   libomptarget_say("Building ${machine} plugin for dlopened libffi")
-  target_sources(omptarget.rtl.${machine} PRIVATE dynamic_ffi/ffi.cpp)
-  target_include_directories(omptarget.rtl.${machine} PRIVATE dynamic_ffi)
+  target_sources(omptarget.rtl.host PRIVATE dynamic_ffi/ffi.cpp)
+  target_include_directories(omptarget.rtl.host PRIVATE dynamic_ffi)
 endif()
 
 # Install plugin under the lib destination folder.
-install(TARGETS omptarget.rtl.${machine}
+install(TARGETS omptarget.rtl.host
 LIBRARY DESTINATION "${OPENMP_INSTALL_LIBDIR}")
-set_target_properties(omptarget.rtl.${machine} PROPERTIES
+set_target_properties(omptarget.rtl.host PROPERTIES
   INSTALL_RPATH "$ORIGIN" BUILD_RPATH "$ORIGIN:${CMAKE_CURRENT_BINARY_DIR}/.."
   POSITION_INDEPENDENT_CODE ON
   CXX_VISIBILITY_PRESET protected)
 
-target_include_directories(omptarget.rtl.${machine} PRIVATE
+target_include_directories(omptarget.rtl.host PRIVATE
${LIBOMPTARGET_INCLUDE_DIR})
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
-  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.${machine})
+  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.host)
   set(LIBOMPTARGET_TESTED_PLUGINS
   "${LIBOMPTARGET_TESTED_PLUGINS}" PARENT_SCOPE)
 else()
@@ -53,29 +53,29 @@ endif()
 # Define the target specific triples and ELF machine values.
 if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$" OR
CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_PPC64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE TARGET_ELF_ID=EM_PPC64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="powerpc64-ibm-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"powerpc64-ibm-linux-gnu" "powerpc64-ibm-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_X86_64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_X86_64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="x86_64-pc-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"x86_64-pc-linux-gnu" "x86_64-pc-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
+  target_compile_definiti

[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From bb5f330cc3d5e0758825b25e3b8209fb7af6be79 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 15:27:16 -0500
Subject: [PATCH 1/3] [Libomptarget] Rename `libomptarget.rtl.x86_64` to
 `libomptarget.rtl.host`

Summary:
All of these are functionally the same code, just compiled for separate
architectures. We currently do not expose a way to execute these on
separate architectures as the host plugin works using `dlopen` into the
same process, and therefore cannot possibly be an incompatible
architecture. (This could work with a remote plugin, but this is not
supported yet).

This patch simply renames all of these to the same thing so we no longer
need to check around for its varying definitions.
---
 .../plugins-nextgen/host/CMakeLists.txt   | 36 +--
 openmp/libomptarget/src/CMakeLists.txt|  5 +--
 2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
index ccbf7d033fd663..0954f8367654f6 100644
--- a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
@@ -14,36 +14,36 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$")
 endif()
 
 # Create the library and add the default arguments.
-add_target_library(omptarget.rtl.${machine} ${machine})
+add_target_library(omptarget.rtl.host ${machine})
 
-target_sources(omptarget.rtl.${machine} PRIVATE src/rtl.cpp)
+target_sources(omptarget.rtl.host PRIVATE src/rtl.cpp)
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
   libomptarget_say("Building ${machine} plugin linked with libffi")
   if(FFI_STATIC_LIBRARIES)
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi_static)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi_static)
   else()
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi)
   endif()
 else()
   libomptarget_say("Building ${machine} plugin for dlopened libffi")
-  target_sources(omptarget.rtl.${machine} PRIVATE dynamic_ffi/ffi.cpp)
-  target_include_directories(omptarget.rtl.${machine} PRIVATE dynamic_ffi)
+  target_sources(omptarget.rtl.host PRIVATE dynamic_ffi/ffi.cpp)
+  target_include_directories(omptarget.rtl.host PRIVATE dynamic_ffi)
 endif()
 
 # Install plugin under the lib destination folder.
-install(TARGETS omptarget.rtl.${machine}
+install(TARGETS omptarget.rtl.host
 LIBRARY DESTINATION "${OPENMP_INSTALL_LIBDIR}")
-set_target_properties(omptarget.rtl.${machine} PROPERTIES
+set_target_properties(omptarget.rtl.host PROPERTIES
   INSTALL_RPATH "$ORIGIN" BUILD_RPATH "$ORIGIN:${CMAKE_CURRENT_BINARY_DIR}/.."
   POSITION_INDEPENDENT_CODE ON
   CXX_VISIBILITY_PRESET protected)
 
-target_include_directories(omptarget.rtl.${machine} PRIVATE
+target_include_directories(omptarget.rtl.host PRIVATE
${LIBOMPTARGET_INCLUDE_DIR})
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
-  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.${machine})
+  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.host)
   set(LIBOMPTARGET_TESTED_PLUGINS
   "${LIBOMPTARGET_TESTED_PLUGINS}" PARENT_SCOPE)
 else()
@@ -53,29 +53,29 @@ endif()
 # Define the target specific triples and ELF machine values.
 if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$" OR
CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_PPC64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE TARGET_ELF_ID=EM_PPC64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="powerpc64-ibm-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"powerpc64-ibm-linux-gnu" "powerpc64-ibm-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_X86_64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_X86_64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="x86_64-pc-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"x86_64-pc-linux-gnu" "x86_64-pc-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
+  target_compile_definiti

[clang] [OpenMP] Use loaded offloading toolchains to add libraries (PR #87108)

2024-03-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/87108

Summary:
We want to pass these GPU libraries by default if a certain offloading
toolchain is loaded for OpenMP. Previously I parsed this from the
arguments because it's only available in the compilation. This doesn't
really work for `native` and it's extra effort, so this patch just
passes in the `Compilation` as an extr argument and uses that. Tests
should be unaffected.


>From 46e96f60fa2977c98d1cb8cd2950504e9fb2823c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 29 Mar 2024 15:25:00 -0500
Subject: [PATCH] [OpenMP] Use loaded offloading toolchains to add libraries

Summary:
We want to pass these GPU libraries by default if a certain offloading
toolchain is loaded for OpenMP. Previously I parsed this from the
arguments because it's only available in the compilation. This doesn't
really work for `native` and it's extra effort, so this patch just
passes in the `Compilation` as an extr argument and uses that. Tests
should be unaffected.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp | 115 +
 clang/lib/Driver/ToolChains/CommonArgs.h   |   4 +-
 clang/lib/Driver/ToolChains/Darwin.cpp |   2 +-
 clang/lib/Driver/ToolChains/DragonFly.cpp  |   2 +-
 clang/lib/Driver/ToolChains/FreeBSD.cpp|   2 +-
 clang/lib/Driver/ToolChains/Gnu.cpp|   2 +-
 clang/lib/Driver/ToolChains/Haiku.cpp  |   2 +-
 clang/lib/Driver/ToolChains/NetBSD.cpp |   2 +-
 clang/lib/Driver/ToolChains/OpenBSD.cpp|   2 +-
 clang/lib/Driver/ToolChains/Solaris.cpp|   2 +-
 10 files changed, 61 insertions(+), 74 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index ace4fb99581e38..60d14762ba37ba 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -876,7 +876,7 @@ void tools::addLTOOptions(const ToolChain &ToolChain, const 
ArgList &Args,
   // LowerMatrixIntrinsicsPass, which is transitively called by
   // buildThinLTODefaultPipeline under EnableMatrix.
   if ((IsThinLTO || IsFatLTO || IsUnifiedLTO) &&
-Args.hasArg(options::OPT_fenable_matrix))
+  Args.hasArg(options::OPT_fenable_matrix))
 CmdArgs.push_back(
 Args.MakeArgString(Twine(PluginOptPrefix) + "-enable-matrix"));
 
@@ -1075,14 +1075,14 @@ void tools::addLTOOptions(const ToolChain &ToolChain, 
const ArgList &Args,
 
 /// Adds the '-lcgpu' and '-lmgpu' libraries to the compilation to include the
 /// LLVM C library for GPUs.
-static void addOpenMPDeviceLibC(const ToolChain &TC, const ArgList &Args,
+static void addOpenMPDeviceLibC(const Compilation &C, const ArgList &Args,
 ArgStringList &CmdArgs) {
   if (Args.hasArg(options::OPT_nogpulib) || Args.hasArg(options::OPT_nolibc))
 return;
 
   // Check the resource directory for the LLVM libc GPU declarations. If it's
   // found we can assume that LLVM was built with support for the GPU libc.
-  SmallString<256> LibCDecls(TC.getDriver().ResourceDir);
+  SmallString<256> LibCDecls(C.getDriver().ResourceDir);
   llvm::sys::path::append(LibCDecls, "include", "llvm_libc_wrappers",
   "llvm-libc-decls");
   bool HasLibC = llvm::sys::fs::exists(LibCDecls) &&
@@ -1090,38 +1090,23 @@ static void addOpenMPDeviceLibC(const ToolChain &TC, 
const ArgList &Args,
   if (!Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC))
 return;
 
-  // We don't have access to the offloading toolchains here, so determine from
-  // the arguments if we have any active NVPTX or AMDGPU toolchains.
-  llvm::DenseSet Libraries;
-  if (const Arg *Targets = Args.getLastArg(options::OPT_fopenmp_targets_EQ)) {
-if (llvm::any_of(Targets->getValues(),
- [](auto S) { return llvm::Triple(S).isAMDGPU(); })) {
-  Libraries.insert("-lcgpu-amdgpu");
-  Libraries.insert("-lmgpu-amdgpu");
-}
-if (llvm::any_of(Targets->getValues(),
- [](auto S) { return llvm::Triple(S).isNVPTX(); })) {
-  Libraries.insert("-lcgpu-nvptx");
-  Libraries.insert("-lmgpu-nvptx");
-}
-  }
+  SmallVector ToolChains;
+  auto TCRange = C.getOffloadToolChains(Action::OFK_OpenMP);
+  for (auto TI = TCRange.first, TE = TCRange.second; TI != TE; ++TI)
+ToolChains.push_back(TI->second);
 
-  for (StringRef Arch : Args.getAllArgValues(options::OPT_offload_arch_EQ)) {
-if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
-  return IsAMDGpuArch(StringToCudaArch(Str));
-})) {
-  Libraries.insert("-lcgpu-amdgpu");
-  Libraries.insert("-lmgpu-amdgpu");
-}
-if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
-  return IsNVIDIAGpuArch(StringToCudaArch(Str));
-})) {
-  Libraries.insert("-lcgpu-nvptx");
-  Libraries.insert("-lmgpu-nvptx");
-}
+  if (llvm::any_of(ToolChains, [](const ToolChain *TC) {
+ret

[clang] [OpenMP] Use loaded offloading toolchains to add libraries (PR #87108)

2024-03-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87108

>From 4415c4d4b9c72e963d4c483440598933d59e19cc Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 29 Mar 2024 15:25:00 -0500
Subject: [PATCH] [OpenMP] Use loaded offloading toolchains to add libraries

Summary:
We want to pass these GPU libraries by default if a certain offloading
toolchain is loaded for OpenMP. Previously I parsed this from the
arguments because it's only available in the compilation. This doesn't
really work for `native` and it's extra effort, so this patch just
passes in the `Compilation` as an extr argument and uses that. Tests
should be unaffected.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp | 58 --
 clang/lib/Driver/ToolChains/CommonArgs.h   |  4 +-
 clang/lib/Driver/ToolChains/Darwin.cpp |  2 +-
 clang/lib/Driver/ToolChains/DragonFly.cpp  |  2 +-
 clang/lib/Driver/ToolChains/FreeBSD.cpp|  2 +-
 clang/lib/Driver/ToolChains/Gnu.cpp|  2 +-
 clang/lib/Driver/ToolChains/Haiku.cpp  |  2 +-
 clang/lib/Driver/ToolChains/NetBSD.cpp |  2 +-
 clang/lib/Driver/ToolChains/OpenBSD.cpp|  2 +-
 clang/lib/Driver/ToolChains/Solaris.cpp|  2 +-
 10 files changed, 32 insertions(+), 46 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index ace4fb99581e38..62a53b85ce098b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1075,14 +1075,14 @@ void tools::addLTOOptions(const ToolChain &ToolChain, 
const ArgList &Args,
 
 /// Adds the '-lcgpu' and '-lmgpu' libraries to the compilation to include the
 /// LLVM C library for GPUs.
-static void addOpenMPDeviceLibC(const ToolChain &TC, const ArgList &Args,
+static void addOpenMPDeviceLibC(const Compilation &C, const ArgList &Args,
 ArgStringList &CmdArgs) {
   if (Args.hasArg(options::OPT_nogpulib) || Args.hasArg(options::OPT_nolibc))
 return;
 
   // Check the resource directory for the LLVM libc GPU declarations. If it's
   // found we can assume that LLVM was built with support for the GPU libc.
-  SmallString<256> LibCDecls(TC.getDriver().ResourceDir);
+  SmallString<256> LibCDecls(C.getDriver().ResourceDir);
   llvm::sys::path::append(LibCDecls, "include", "llvm_libc_wrappers",
   "llvm-libc-decls");
   bool HasLibC = llvm::sys::fs::exists(LibCDecls) &&
@@ -1090,38 +1090,23 @@ static void addOpenMPDeviceLibC(const ToolChain &TC, 
const ArgList &Args,
   if (!Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC))
 return;
 
-  // We don't have access to the offloading toolchains here, so determine from
-  // the arguments if we have any active NVPTX or AMDGPU toolchains.
-  llvm::DenseSet Libraries;
-  if (const Arg *Targets = Args.getLastArg(options::OPT_fopenmp_targets_EQ)) {
-if (llvm::any_of(Targets->getValues(),
- [](auto S) { return llvm::Triple(S).isAMDGPU(); })) {
-  Libraries.insert("-lcgpu-amdgpu");
-  Libraries.insert("-lmgpu-amdgpu");
-}
-if (llvm::any_of(Targets->getValues(),
- [](auto S) { return llvm::Triple(S).isNVPTX(); })) {
-  Libraries.insert("-lcgpu-nvptx");
-  Libraries.insert("-lmgpu-nvptx");
-}
-  }
+  SmallVector ToolChains;
+  auto TCRange = C.getOffloadToolChains(Action::OFK_OpenMP);
+  for (auto TI = TCRange.first, TE = TCRange.second; TI != TE; ++TI)
+ToolChains.push_back(TI->second);
 
-  for (StringRef Arch : Args.getAllArgValues(options::OPT_offload_arch_EQ)) {
-if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
-  return IsAMDGpuArch(StringToCudaArch(Str));
-})) {
-  Libraries.insert("-lcgpu-amdgpu");
-  Libraries.insert("-lmgpu-amdgpu");
-}
-if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
-  return IsNVIDIAGpuArch(StringToCudaArch(Str));
-})) {
-  Libraries.insert("-lcgpu-nvptx");
-  Libraries.insert("-lmgpu-nvptx");
-}
+  if (llvm::any_of(ToolChains, [](const ToolChain *TC) {
+return TC->getTriple().isAMDGPU();
+  })) {
+CmdArgs.push_back("-lcgpu-amdgpu");
+CmdArgs.push_back("-lmgpu-amdgpu");
+  }
+  if (llvm::any_of(ToolChains, [](const ToolChain *TC) {
+return TC->getTriple().isNVPTX();
+  })) {
+CmdArgs.push_back("-lcgpu-nvptx");
+CmdArgs.push_back("-lmgpu-nvptx");
   }
-
-  llvm::append_range(CmdArgs, Libraries);
 }
 
 void tools::addOpenMPRuntimeLibraryPath(const ToolChain &TC,
@@ -1153,9 +1138,10 @@ void tools::addArchSpecificRPath(const ToolChain &TC, 
const ArgList &Args,
   }
 }
 
-bool tools::addOpenMPRuntime(ArgStringList &CmdArgs, const ToolChain &TC,
- const ArgList &Args, bool ForceStaticHostRuntime,
- bool IsOffloadingHost, bool GompNeedsRT) {
+bool tools::addOpenMPRuntime(const Compilation &C, ArgStringList &Cmd

[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-29 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Some OMPT variables with the same name used to be present in both 
> libomptarget and the common plugin interface. These should probably be 
> re-worked in the new scheme of static linking? e.g. 
> llvm::omp::target::ompt::Initialized

Unsure, there was an issue where each plugin called `Initialize` separately 
that is no longer needed. I just deleted those. Are there any other cases that 
need to be handled?

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-29 Thread Joseph Huber via cfe-commits


@@ -3043,10 +3043,6 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
 // HSA functions from now on, e.g., hsa_shut_down.
 Initialized = true;
 
-#ifdef OMPT_SUPPORT
-ompt::connectLibrary();

jhuber6 wrote:

Ah, well that can be a follow-up patch since it seems to work here. Would be 
nice to get all of this OMPT stuff out of the plugins, which this should also 
help since we have actual `init` and `deinit` calls from `libomptarget` now.

https://github.com/llvm/llvm-project/pull/87009
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [Libomptarget] Statically link all plugin runtimes (PR #87009)

2024-03-29 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87009

>From d9e2231c179e3ae321883203ad4799971a982110 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 27 Mar 2024 15:27:16 -0500
Subject: [PATCH 1/3] [Libomptarget] Rename `libomptarget.rtl.x86_64` to
 `libomptarget.rtl.host`

Summary:
All of these are functionally the same code, just compiled for separate
architectures. We currently do not expose a way to execute these on
separate architectures as the host plugin works using `dlopen` into the
same process, and therefore cannot possibly be an incompatible
architecture. (This could work with a remote plugin, but this is not
supported yet).

This patch simply renames all of these to the same thing so we no longer
need to check around for its varying definitions.
---
 .../plugins-nextgen/host/CMakeLists.txt   | 36 +--
 openmp/libomptarget/src/CMakeLists.txt|  5 +--
 2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
index ccbf7d033fd663..0954f8367654f6 100644
--- a/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/host/CMakeLists.txt
@@ -14,36 +14,36 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$")
 endif()
 
 # Create the library and add the default arguments.
-add_target_library(omptarget.rtl.${machine} ${machine})
+add_target_library(omptarget.rtl.host ${machine})
 
-target_sources(omptarget.rtl.${machine} PRIVATE src/rtl.cpp)
+target_sources(omptarget.rtl.host PRIVATE src/rtl.cpp)
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
   libomptarget_say("Building ${machine} plugin linked with libffi")
   if(FFI_STATIC_LIBRARIES)
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi_static)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi_static)
   else()
-target_link_libraries(omptarget.rtl.${machine} PRIVATE FFI::ffi)
+target_link_libraries(omptarget.rtl.host PRIVATE FFI::ffi)
   endif()
 else()
   libomptarget_say("Building ${machine} plugin for dlopened libffi")
-  target_sources(omptarget.rtl.${machine} PRIVATE dynamic_ffi/ffi.cpp)
-  target_include_directories(omptarget.rtl.${machine} PRIVATE dynamic_ffi)
+  target_sources(omptarget.rtl.host PRIVATE dynamic_ffi/ffi.cpp)
+  target_include_directories(omptarget.rtl.host PRIVATE dynamic_ffi)
 endif()
 
 # Install plugin under the lib destination folder.
-install(TARGETS omptarget.rtl.${machine}
+install(TARGETS omptarget.rtl.host
 LIBRARY DESTINATION "${OPENMP_INSTALL_LIBDIR}")
-set_target_properties(omptarget.rtl.${machine} PROPERTIES
+set_target_properties(omptarget.rtl.host PROPERTIES
   INSTALL_RPATH "$ORIGIN" BUILD_RPATH "$ORIGIN:${CMAKE_CURRENT_BINARY_DIR}/.."
   POSITION_INDEPENDENT_CODE ON
   CXX_VISIBILITY_PRESET protected)
 
-target_include_directories(omptarget.rtl.${machine} PRIVATE
+target_include_directories(omptarget.rtl.host PRIVATE
${LIBOMPTARGET_INCLUDE_DIR})
 
 if(LIBOMPTARGET_DEP_LIBFFI_FOUND)
-  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.${machine})
+  list(APPEND LIBOMPTARGET_TESTED_PLUGINS omptarget.rtl.host)
   set(LIBOMPTARGET_TESTED_PLUGINS
   "${LIBOMPTARGET_TESTED_PLUGINS}" PARENT_SCOPE)
 else()
@@ -53,29 +53,29 @@ endif()
 # Define the target specific triples and ELF machine values.
 if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64le$" OR
CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_PPC64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE TARGET_ELF_ID=EM_PPC64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="powerpc64-ibm-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"powerpc64-ibm-linux-gnu" "powerpc64-ibm-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_X86_64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_X86_64)
+  target_compile_definitions(omptarget.rtl.host PRIVATE
   LIBOMPTARGET_NEXTGEN_GENERIC_PLUGIN_TRIPLE="x86_64-pc-linux-gnu")
   list(APPEND LIBOMPTARGET_SYSTEM_TARGETS 
"x86_64-pc-linux-gnu" "x86_64-pc-linux-gnu-LTO")
   set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" 
PARENT_SCOPE)
 elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64$")
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
-  target_compile_definitions(omptarget.rtl.${machine} PRIVATE
+  target_compile_definitions(omptarget.rtl.host PRIVATE 
TARGET_ELF_ID=EM_AARCH64)
+  target_compile_definiti

[clang] [OpenMP] Use loaded offloading toolchains to add libraries (PR #87108)

2024-04-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87108
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][NFC] Clean up unused binary files for offloading tests (PR #87351)

2024-04-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/87351

Summary:
We have a few binary files used for offloading tests that are either
entirely unusable or easily replaceable.


>From 23f68e714aab94c7600a3af9363e9ba678ba2d05 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 2 Apr 2024 09:09:51 -0500
Subject: [PATCH] [Clang][NFC] Clean up unused binary files for offloading
 tests

Summary:
We have a few binary files used for offloading tests that are either
entirely unusable or easily replaceable.
---
 clang/test/Driver/Inputs/in.so   |   1 -
 .../libomptarget/libomptarget-new-nvptx-sm_35.bc |   1 -
 .../libomptarget/libomptarget-new-nvptx-test.bc  |   1 -
 .../Inputs/openmp_static_device_link/empty.o |   0
 .../Inputs/openmp_static_device_link/lib.bc  | Bin 1092 -> 0 bytes
 .../openmp_static_device_link/libFatArchive.a|   0
 clang/test/Driver/hip-link-shared-library.hip|   5 +++--
 clang/test/Driver/openmp-offload-gpu.c   |   3 +--
 8 files changed, 4 insertions(+), 7 deletions(-)
 delete mode 100644 clang/test/Driver/Inputs/in.so
 delete mode 100644 
clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-sm_35.bc
 delete mode 100644 
clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
 delete mode 100644 clang/test/Driver/Inputs/openmp_static_device_link/empty.o
 delete mode 100644 clang/test/Driver/Inputs/openmp_static_device_link/lib.bc
 delete mode 100644 
clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a

diff --git a/clang/test/Driver/Inputs/in.so b/clang/test/Driver/Inputs/in.so
deleted file mode 100644
index 8b137891791fe9..00
--- a/clang/test/Driver/Inputs/in.so
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git 
a/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-sm_35.bc 
b/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-sm_35.bc
deleted file mode 100644
index 8b137891791fe9..00
--- a/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-sm_35.bc
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git 
a/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-test.bc 
b/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
deleted file mode 100644
index 8b137891791fe9..00
--- a/clang/test/Driver/Inputs/libomptarget/libomptarget-new-nvptx-test.bc
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/clang/test/Driver/Inputs/openmp_static_device_link/empty.o 
b/clang/test/Driver/Inputs/openmp_static_device_link/empty.o
deleted file mode 100644
index e69de29bb2d1d6..00
diff --git a/clang/test/Driver/Inputs/openmp_static_device_link/lib.bc 
b/clang/test/Driver/Inputs/openmp_static_device_link/lib.bc
deleted file mode 100644
index 
1a87fd836dba2c8b03f53733e4782e15996b96b9..
GIT binary patch
literal 0
HcmV?d1

literal 1092
zcmXX_VQ3p=82*xLb7vdBi#2Qsy*uyLOv7~bme!i=5-`0vBAb|@KRT$G%SvpulxvMi
z+6_&59bbkKvQjWVDzwl){)^Bb{ZXXJbu*SQ>JTXyN@LTmN>+mFWa57KrSrJ&{dn(t
zdGC3i_xbwewe2^o0JH)C!e-I&?$sY6U;n%OUT4X1!Qg5If*F8+@L9W207kr@z>7?E
z-S(GS-Z5ERo>{|;`E)B~=UToBCtKb&_RqHWJ1j1%o3dSMb9JNx>blT%p#xop?>Bh$P)TOJ%0#RNUs`t0=$ZZ`Ihi0o%Z;1&0sYVdkCL#*+|LyY?7c-MBDNlv
zX`N-lSCV}h!dFgYqAGcvA}0e#Rmmxph$WE9BYXJ;SkJ<$NRNs1ZUZ)gu*b5_ZVSwo
zz!nuQBHof#HEC1!EEC>U%{xO#<%TsDBCijVZw{jg9Fs`RA~i(DoMh}tD{dQ4qK_yv
zZbK6}G?`1icG!T2JMi%utcygrw8+#JneuV0D9nbytf$xw#q?`d_LO8#?l`XK?m5|=
zCZsw2b^Oz()6y@5l-JEajv#daO{iLQm`sFF)P|x9$bRpA`Vi261z{}hdZVQPw$EJ9#8)rgku)CD?*$;+A;mH18YI}MC|&#t}w+EyH!0|7ubD)
zT^Cq9!2{u?2>-3LD*ILl&z$Bd53J(pW3p2iX8^@jY_jic00`J)?QIiIJUen~K+WNL
z9joNo)Tj5EyNk~s_?yD5NOUZwMiYI#XUAeALnF%B
SYo8A(qa)$&(a~lV0RI6M5`u96

diff --git a/clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a 
b/clang/test/Driver/Inputs/openmp_static_device_link/libFatArchive.a
deleted file mode 100644
index e69de29bb2d1d6..00
diff --git a/clang/test/Driver/hip-link-shared-library.hip 
b/clang/test/Driver/hip-link-shared-library.hip
index 73643682dda8ae..a075ee82dda1cb 100644
--- a/clang/test/Driver/hip-link-shared-library.hip
+++ b/clang/test/Driver/hip-link-shared-library.hip
@@ -1,6 +1,7 @@
 // RUN: touch %t.o
+// RUN: touch %t.so
 // RUN: %clang --hip-link -ccc-print-bindings --target=x86_64-linux-gnu \
-// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %t.o %S/Inputs/in.so \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %t.o %t.so \
 // RUN:   --no-offload-new-driver -fgpu-rdc 2>&1 | FileCheck %s
 
 // CHECK: # "x86_64-unknown-linux-gnu" - "offload bundler", inputs: 
["[[IN:.*o]]"], outputs: ["[[HOSTOBJ:.*o]]", "{{.*o}}", "{{.*o}}"]
@@ -11,4 +12,4 @@
 // CHECK-NOT: offload bundler
 // CHECK: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[IMG1]]", 
"[[IMG2]]"], output: "[[FATBINOBJ:.*o]]"
 // CHECK-NOT: offload bundler
-// CHECK: # "x86_64-unknown-linux-gnu" - "GNU::Linker", inputs: 
["[[HOSTOBJ]]", "{{.*}}/Inputs/in.so", "[[FATBINOBJ]]"], output: "a.out"
+// CHECK: # "x86_64-unknown-linux-gnu" 

[clang] [Clang][NFC] Clean up unused binary files for offloading tests (PR #87351)

2024-04-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/87351
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][NFC] Clean up unused binary files for offloading tests (PR #87351)

2024-04-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87351
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (PR #87695)

2024-04-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/87695

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known


>From a314dadecad6f12db20c34a133ec7bb084a77b5d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 4 Apr 2024 15:10:55 -0500
Subject: [PATCH] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP
 kernels

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
---
 clang/test/OpenMP/thread_limit_amdgpu.c   | 34 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 ++
 2 files changed, 37 insertions(+)
 create mode 100644 clang/test/OpenMP/thread_limit_amdgpu.c

diff --git a/clang/test/OpenMP/thread_limit_amdgpu.c 
b/clang/test/OpenMP/thread_limit_amdgpu.c
new file mode 100644
index 00..f884eeb73c3ff1
--- /dev/null
+++ b/clang/test/OpenMP/thread_limit_amdgpu.c
@@ -0,0 +1,34 @@
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device 
-fopenmp-host-ir-file-path %t-x86-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void foo(int N) {
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd thread_limit(4)
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42 num_threads(22)
+  for (int i = 0; i < N; ++i)
+;
+}
+
+#endif
+
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l10({{.*}}) #[[ATTR1:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l13({{.*}}) #[[ATTR2:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l16({{.*}}) #[[ATTR3:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l19({{.*}}) #[[ATTR4:.+]] {
+
+// CHECK: attributes #[[ATTR1]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,256" {{.*}} }
+// CHECK: attributes #[[ATTR2]] = { {{.*}} "amdgpu-flat-work-group-size"="1,4" 
{{.*}} }
+// CHECK: attributes #[[ATTR3]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,42" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
+// CHECK: attributes #[[ATTR4]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,22" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 16507a69ea8502..4fe44b10d1bd0e 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,6 +4791,9 @@ void OpenMPIRBuilder::writeTeamsForKernel(const Triple 
&T, Function &Kernel,
   updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
 updateNVPTXMetadata(Kernel, "minctasm", LB, false);
   }
+  if (T.isAMDGPU()) {
+Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
+  }
   Kernel.addFnAttr("omp_target_num_teams", std::to_string(LB));
 }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (PR #87695)

2024-04-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87695

>From a314dadecad6f12db20c34a133ec7bb084a77b5d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 4 Apr 2024 15:10:55 -0500
Subject: [PATCH 1/2] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP
 kernels

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
---
 clang/test/OpenMP/thread_limit_amdgpu.c   | 34 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 ++
 2 files changed, 37 insertions(+)
 create mode 100644 clang/test/OpenMP/thread_limit_amdgpu.c

diff --git a/clang/test/OpenMP/thread_limit_amdgpu.c 
b/clang/test/OpenMP/thread_limit_amdgpu.c
new file mode 100644
index 00..f884eeb73c3ff1
--- /dev/null
+++ b/clang/test/OpenMP/thread_limit_amdgpu.c
@@ -0,0 +1,34 @@
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device 
-fopenmp-host-ir-file-path %t-x86-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void foo(int N) {
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd thread_limit(4)
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42 num_threads(22)
+  for (int i = 0; i < N; ++i)
+;
+}
+
+#endif
+
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l10({{.*}}) #[[ATTR1:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l13({{.*}}) #[[ATTR2:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l16({{.*}}) #[[ATTR3:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l19({{.*}}) #[[ATTR4:.+]] {
+
+// CHECK: attributes #[[ATTR1]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,256" {{.*}} }
+// CHECK: attributes #[[ATTR2]] = { {{.*}} "amdgpu-flat-work-group-size"="1,4" 
{{.*}} }
+// CHECK: attributes #[[ATTR3]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,42" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
+// CHECK: attributes #[[ATTR4]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,22" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 16507a69ea8502..4fe44b10d1bd0e 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,6 +4791,9 @@ void OpenMPIRBuilder::writeTeamsForKernel(const Triple 
&T, Function &Kernel,
   updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
 updateNVPTXMetadata(Kernel, "minctasm", LB, false);
   }
+  if (T.isAMDGPU()) {
+Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
+  }
   Kernel.addFnAttr("omp_target_num_teams", std::to_string(LB));
 }
 

>From f4710bad402366d4694da84bf24459f62e6c6b42 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 4 Apr 2024 15:54:29 -0500
Subject: [PATCH 2/2] Update llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Co-authored-by: Shilei Tian 
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 4fe44b10d1bd0e..1188075c7b2c47 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,7 +4791,8 @@ void OpenMPIRBuilder::writeTeamsForKernel(const Triple 
&T, Function &Kernel,
   updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
 updateNVPTXMetadata(Kernel, "minctasm", LB, false);
   }
-  if (T.isAMDGPU()) {
+  if (T.isAMDGPU())
+Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
 Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
   }
   Kernel.addFnAttr("omp_target_num_teams", std::to_string(LB));

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (PR #87695)

2024-04-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/87695

>From 1738c7f54bc838eac29402c4248db063d908d575 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 4 Apr 2024 15:10:55 -0500
Subject: [PATCH] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP
 kernels

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
---
 clang/test/OpenMP/thread_limit_amdgpu.c   | 34 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 ++
 2 files changed, 37 insertions(+)
 create mode 100644 clang/test/OpenMP/thread_limit_amdgpu.c

diff --git a/clang/test/OpenMP/thread_limit_amdgpu.c 
b/clang/test/OpenMP/thread_limit_amdgpu.c
new file mode 100644
index 00..f884eeb73c3ff1
--- /dev/null
+++ b/clang/test/OpenMP/thread_limit_amdgpu.c
@@ -0,0 +1,34 @@
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa 
-fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device 
-fopenmp-host-ir-file-path %t-x86-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void foo(int N) {
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd thread_limit(4)
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42
+  for (int i = 0; i < N; ++i)
+;
+#pragma omp target teams distribute parallel for simd 
ompx_attribute(__attribute__((launch_bounds(42, 42 num_threads(22)
+  for (int i = 0; i < N; ++i)
+;
+}
+
+#endif
+
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l10({{.*}}) #[[ATTR1:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l13({{.*}}) #[[ATTR2:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l16({{.*}}) #[[ATTR3:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void 
@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l19({{.*}}) #[[ATTR4:.+]] {
+
+// CHECK: attributes #[[ATTR1]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,256" {{.*}} }
+// CHECK: attributes #[[ATTR2]] = { {{.*}} "amdgpu-flat-work-group-size"="1,4" 
{{.*}} }
+// CHECK: attributes #[[ATTR3]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,42" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
+// CHECK: attributes #[[ATTR4]] = { {{.*}} 
"amdgpu-flat-work-group-size"="1,22" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} 
}
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 16507a69ea8502..7fd8474c2ec890 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,6 +4791,9 @@ void OpenMPIRBuilder::writeTeamsForKernel(const Triple 
&T, Function &Kernel,
   updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
 updateNVPTXMetadata(Kernel, "minctasm", LB, false);
   }
+  if (T.isAMDGPU())
+Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
+
   Kernel.addFnAttr("omp_target_num_teams", std::to_string(LB));
 }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (PR #87695)

2024-04-05 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Do not include config files for device linking (PR #87659)

2024-04-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/87659
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)

2024-04-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/88007

>From 0dbfd89b69197df8201b772b88654810e689cced Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 8 Apr 2024 10:01:06 -0500
Subject: [PATCH] [Libomp] Place generated OpenMP headers into build resource
 directory

Summary:
These headers are a part of the compiler's resource directory once
installed. However, they are currently placed in the binary directory
temporarily. This makes it more difficult to use the compiler out of the
build directory and will cause issues when moving to `liboffload`. This
patch changes the logic to write these instead to the copmiler's
resource directory inside of the build tree.

NOTE: This doesn't change the Fortran headers, I don't know enough about
those and it won't use the same directory.
---
 clang/test/Headers/Inputs/include/stdint.h |  8 +++
 openmp/runtime/src/CMakeLists.txt  | 25 ++
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/clang/test/Headers/Inputs/include/stdint.h 
b/clang/test/Headers/Inputs/include/stdint.h
index 5bf26a7b67b066..67b27b8dfc7b92 100644
--- a/clang/test/Headers/Inputs/include/stdint.h
+++ b/clang/test/Headers/Inputs/include/stdint.h
@@ -16,4 +16,12 @@ typedef unsigned __INTPTR_TYPE__ uintptr_t;
 #error Every target should have __INTPTR_TYPE__
 #endif
 
+#ifdef __INTPTR_MAX__
+#define  INTPTR_MAX__INTPTR_MAX__
+#endif
+
+#ifdef __UINTPTR_MAX__
+#define UINTPTR_MAX   __UINTPTR_MAX__
+#endif
+
 #endif /* STDINT_H */
diff --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index f05bcabb441742..000d02c33dc093 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -10,12 +10,19 @@
 
 include(ExtendPath)
 
+# The generated headers will be placed in clang's resource directory if 
present.
+if(${OPENMP_STANDALONE_BUILD})
+  set(LIBOMP_HEADERS_INTDIR ${CMAKE_CURRENT_BINARY_DIR})
+else()
+  set(LIBOMP_HEADERS_INTDIR ${LLVM_BINARY_DIR}/${LIBOMP_HEADERS_INSTALL_PATH})
+endif()
+
 # Configure omp.h, kmp_config.h and omp-tools.h if necessary
-configure_file(${LIBOMP_INC_DIR}/omp.h.var omp.h @ONLY)
-configure_file(${LIBOMP_INC_DIR}/ompx.h.var ompx.h @ONLY)
-configure_file(kmp_config.h.cmake kmp_config.h @ONLY)
+configure_file(${LIBOMP_INC_DIR}/omp.h.var ${LIBOMP_HEADERS_INTDIR}/omp.h 
@ONLY)
+configure_file(${LIBOMP_INC_DIR}/ompx.h.var ${LIBOMP_HEADERS_INTDIR}/ompx.h 
@ONLY)
+configure_file(kmp_config.h.cmake ${LIBOMP_HEADERS_INTDIR}/kmp_config.h @ONLY)
 if(${LIBOMP_OMPT_SUPPORT})
-  configure_file(${LIBOMP_INC_DIR}/omp-tools.h.var omp-tools.h @ONLY)
+  configure_file(${LIBOMP_INC_DIR}/omp-tools.h.var 
${LIBOMP_HEADERS_INTDIR}/omp-tools.h @ONLY)
 endif()
 
 # Generate message catalog files: kmp_i18n_id.inc and kmp_i18n_default.inc
@@ -419,15 +426,15 @@ else()
 endif()
 install(
   FILES
-  ${CMAKE_CURRENT_BINARY_DIR}/omp.h
-  ${CMAKE_CURRENT_BINARY_DIR}/ompx.h
+  ${LIBOMP_HEADERS_INTDIR}/omp.h
+  ${LIBOMP_HEADERS_INTDIR}/ompx.h
   DESTINATION ${LIBOMP_HEADERS_INSTALL_PATH}
 )
 if(${LIBOMP_OMPT_SUPPORT})
-  install(FILES ${CMAKE_CURRENT_BINARY_DIR}/omp-tools.h DESTINATION 
${LIBOMP_HEADERS_INSTALL_PATH})
+  install(FILES ${LIBOMP_HEADERS_INTDIR}/omp-tools.h DESTINATION 
${LIBOMP_HEADERS_INSTALL_PATH})
   # install under legacy name ompt.h
-  install(FILES ${CMAKE_CURRENT_BINARY_DIR}/omp-tools.h DESTINATION 
${LIBOMP_HEADERS_INSTALL_PATH} RENAME ompt.h)
-  set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR} PARENT_SCOPE)
+  install(FILES ${LIBOMP_HEADERS_INTDIR}/omp-tools.h DESTINATION 
${LIBOMP_HEADERS_INSTALL_PATH} RENAME ompt.h)
+  set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_HEADERS_INTDIR} PARENT_SCOPE)
 endif()
 if(${BUILD_FORTRAN_MODULES})
   set (destination ${LIBOMP_HEADERS_INSTALL_PATH})

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)

2024-04-08 Thread Joseph Huber via cfe-commits


@@ -10,12 +10,19 @@
 
 include(ExtendPath)
 
+# The generated headers will be placed in clang's resource directory if 
present.
+if(${OPENMP_STANDALONE_BUILD})
+  set(LIBOMP_HEADERS_INTDIR ${CMAKE_CURRENT_BINARY_DIR})
+else()
+  set(LIBOMP_HEADERS_INTDIR ${LLVM_BINARY_DIR}/${LIBOMP_HEADERS_INSTALL_PATH})

jhuber6 wrote:

LLVM is always built, I don't think it's possible to turn it off. Everything 
goes through `cmake ../llvm` so it will be set when that CMake runs.

https://github.com/llvm/llvm-project/pull/88007
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


<    1   2   3   4   5   6   7   8   9   10   >