[clang] [llvm] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-07-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84420

>From 778fff60cb81c3a0ffaf0a74264eb7cddd6dfb58 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 15:48:00 -0600
Subject: [PATCH] [Offload] Move HIP and CUDA to new driver by default

Summary:
This patch updates the `--offload-new-driver` flag to be default for all
current offloading languages. This mostly just required updating a lot
of tests to use the old format. I tried to update them where possible,
but some were directly checking the old format.

This is not intended to be landed immediately, but to allow for greater
testing. One potential issue I've discovered is the lack of SPIR-V
support or handling for `--offload`.
---
 clang/lib/Driver/Driver.cpp   |  6 ++---
 clang/lib/Driver/ToolChains/Clang.cpp | 10 ---
 clang/test/Driver/cl-offload.cu   |  5 ++--
 clang/test/Driver/cuda-arch-translation.cu| 26 +--
 clang/test/Driver/cuda-bindings.cu| 24 -
 clang/test/Driver/cuda-options.cu | 23 
 clang/test/Driver/cuda-output-asm.cu  |  4 ---
 clang/test/Driver/hip-gz-options.hip  |  1 -
 clang/test/Driver/hip-invalid-target-id.hip   |  4 +--
 clang/test/Driver/hip-macros.hip  |  3 ---
 clang/test/Driver/hip-offload-arch.hip|  4 +--
 clang/test/Driver/hip-options.hip |  6 +
 clang/test/Driver/hip-sanitize-options.hip|  2 +-
 clang/test/Driver/hip-save-temps.hip  | 12 -
 .../test/Driver/hip-toolchain-device-only.hip |  4 ---
 clang/test/Driver/hip-toolchain-mllvm.hip |  2 --
 clang/test/Driver/invalid-offload-options.cpp |  2 +-
 .../ClangLinkerWrapper.cpp|  7 +++--
 clang/unittests/Tooling/ToolingTest.cpp   |  6 ++---
 llvm/lib/Object/OffloadBinary.cpp | 13 +++---
 20 files changed, 78 insertions(+), 86 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 021c5b8a33dba..2c91dfa5a6a8c 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4143,9 +4143,9 @@ void Driver::BuildActions(Compilation , DerivedArgList 
,
   handleArguments(C, Args, Inputs, Actions);
 
   bool UseNewOffloadingDriver =
-  C.isOffloadingHostKind(Action::OFK_OpenMP) ||
+  C.getActiveOffloadKinds() != Action::OFK_None &&
   Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false);
+   options::OPT_no_offload_new_driver, true);
 
   // Builder to be used to build offloading actions.
   std::unique_ptr OffloadBuilder =
@@ -4866,7 +4866,7 @@ Action *Driver::ConstructPhaseAction(
offloadDeviceOnly() ||
(TargetDeviceOffloadKind == Action::OFK_HIP &&
 !Args.hasFlag(options::OPT_offload_new_driver,
-  options::OPT_no_offload_new_driver, false)))
+  options::OPT_no_offload_new_driver, true)))
   ? types::TY_LLVM_IR
   : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index c43fd3def6db0..11d0f5ef903c4 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4841,8 +4841,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   bool IsHostOffloadingAction =
   JA.isHostOffloading(Action::OFK_OpenMP) ||
   (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
+   C.getActiveOffloadKinds() != Action::OFK_None &&
Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false));
+options::OPT_no_offload_new_driver, true));
 
   bool IsRDCMode =
   Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
@@ -5168,7 +5169,7 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 if (IsUsingLTO) {
   if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) 
&&
   !Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false) &&
+options::OPT_no_offload_new_driver, true) &&
   !Triple.isAMDGPU()) {
 D.Diag(diag::err_drv_unsupported_opt_for_target)
 << Args.getLastArg(options::OPT_foffload_lto,
@@ -6623,8 +6624,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   }
 
   // Forward the new driver to change offloading code generation.
-  if (Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false))
+  if (C.getActiveOffloadKinds() != Action::OFK_None &&
+  Args.hasFlag(options::OPT_offload_new_driver,
+   options::OPT_no_offload_new_driver, true))
 

[clang] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-07-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/84420

>From 3b5c3110cc1e781e9e7a8d9a621970fd3d7e9aa0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 7 Mar 2024 15:48:00 -0600
Subject: [PATCH] [Offload] Move HIP and CUDA to new driver by default

Summary:
This patch updates the `--offload-new-driver` flag to be default for all
current offloading languages. This mostly just required updating a lot
of tests to use the old format. I tried to update them where possible,
but some were directly checking the old format.

This is not intended to be landed immediately, but to allow for greater
testing. One potential issue I've discovered is the lack of SPIR-V
support or handling for `--offload`.
---
 clang/lib/Driver/Driver.cpp   |  6 ++---
 clang/lib/Driver/ToolChains/Clang.cpp | 10 ---
 clang/test/Driver/cl-offload.cu   |  5 ++--
 clang/test/Driver/cuda-arch-translation.cu| 26 +--
 clang/test/Driver/cuda-bindings.cu| 24 -
 clang/test/Driver/cuda-options.cu | 23 
 clang/test/Driver/cuda-output-asm.cu  |  4 ---
 clang/test/Driver/hip-gz-options.hip  |  1 -
 clang/test/Driver/hip-invalid-target-id.hip   |  4 +--
 clang/test/Driver/hip-macros.hip  |  3 ---
 clang/test/Driver/hip-offload-arch.hip|  4 +--
 clang/test/Driver/hip-options.hip |  6 +
 clang/test/Driver/hip-sanitize-options.hip|  2 +-
 clang/test/Driver/hip-save-temps.hip  | 12 -
 .../test/Driver/hip-toolchain-device-only.hip |  4 ---
 clang/test/Driver/hip-toolchain-mllvm.hip |  2 --
 clang/test/Driver/invalid-offload-options.cpp |  2 +-
 clang/unittests/Tooling/ToolingTest.cpp   |  6 ++---
 18 files changed, 64 insertions(+), 80 deletions(-)

diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 021c5b8a33dba..2c91dfa5a6a8c 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4143,9 +4143,9 @@ void Driver::BuildActions(Compilation , DerivedArgList 
,
   handleArguments(C, Args, Inputs, Actions);
 
   bool UseNewOffloadingDriver =
-  C.isOffloadingHostKind(Action::OFK_OpenMP) ||
+  C.getActiveOffloadKinds() != Action::OFK_None &&
   Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false);
+   options::OPT_no_offload_new_driver, true);
 
   // Builder to be used to build offloading actions.
   std::unique_ptr OffloadBuilder =
@@ -4866,7 +4866,7 @@ Action *Driver::ConstructPhaseAction(
offloadDeviceOnly() ||
(TargetDeviceOffloadKind == Action::OFK_HIP &&
 !Args.hasFlag(options::OPT_offload_new_driver,
-  options::OPT_no_offload_new_driver, false)))
+  options::OPT_no_offload_new_driver, true)))
   ? types::TY_LLVM_IR
   : types::TY_LLVM_BC;
   return C.MakeAction(Input, Output);
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index c43fd3def6db0..11d0f5ef903c4 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4841,8 +4841,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   bool IsHostOffloadingAction =
   JA.isHostOffloading(Action::OFK_OpenMP) ||
   (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
+   C.getActiveOffloadKinds() != Action::OFK_None &&
Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false));
+options::OPT_no_offload_new_driver, true));
 
   bool IsRDCMode =
   Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
@@ -5168,7 +5169,7 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
 if (IsUsingLTO) {
   if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) 
&&
   !Args.hasFlag(options::OPT_offload_new_driver,
-options::OPT_no_offload_new_driver, false) &&
+options::OPT_no_offload_new_driver, true) &&
   !Triple.isAMDGPU()) {
 D.Diag(diag::err_drv_unsupported_opt_for_target)
 << Args.getLastArg(options::OPT_foffload_lto,
@@ -6623,8 +6624,9 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   }
 
   // Forward the new driver to change offloading code generation.
-  if (Args.hasFlag(options::OPT_offload_new_driver,
-   options::OPT_no_offload_new_driver, false))
+  if (C.getActiveOffloadKinds() != Action::OFK_None &&
+  Args.hasFlag(options::OPT_offload_new_driver,
+   options::OPT_no_offload_new_driver, true))
 CmdArgs.push_back("--offload-new-driver");
 
   SanitizeArgs.addArgs(TC, Args, CmdArgs, InputType);
diff --git 

[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Joseph Huber via cfe-commits


@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation , const 
JobAction ,
   else if (Args.hasArg(options::OPT_mcpu_EQ))
 CmdArgs.push_back(Args.MakeArgString(
 "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+

jhuber6 wrote:

So, I wasn't sure if I should also apply this to HIP / CUDA stuff just yet. HIP 
doesn't pass LTO bitcode to the linker, so we can't do a full link with the 
library. CUDA doesn't even invoke the linker either. I think in the future it 
would be nice to provide these to HIP and CUDA by default however. 

https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Joseph Huber via cfe-commits


@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation , const 
JobAction ,
   else if (Args.hasArg(options::OPT_mcpu_EQ))
 CmdArgs.push_back(Args.MakeArgString(
 "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+
+  // If the user's toolchain has the 'include/amdgcn-amd-amdhsa/` path, we
+  // assume it supports the standard C libraries for the GPU and include them.
+  bool HasLibC = getToolChain().getStdlibIncludePath().has_value();

jhuber6 wrote:

It's not really offload in this context, since it deals with `clang 
--target=amdgcn-amd-amdhsa`. But I could see putting it in `CommonArgs`. 

https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/98170

>From 6c6c781a658c4349073a40e0a0ecc10a893a4ca8 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH 1/3] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

So, one thing I've noticed is that passing `-lc` and `-lm` to the `ld.lld` 
invocation greatly increases link times for trivial applications. This is 
because the handling in `ld.lld` will intentionally extract known `libcall` 
functions from LTO static libraries. We then have handling in the LTO 
internalization pass  that prevents these calls from being internalized so that 
the backend may emit calls to them. This has the result of increasing the 
compile time as it extracts about fifty math functions, as well as bloating the 
resulting binary because they do not get optimized out. The AMDGPU backend 
doesn't emit any of these as far as I'm aware. Right now this simply goes off 
of a list of all libcalls. I wonder if I can make a separate list that's 
per-target, just to show that the AMDGPU target doesn't emit any of these for 
now and should be internalized / not extracted. WDYT @yxsamliu and @arsenm ?

https://github.com/llvm/llvm-project/pull/98170
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` (PR #98170)

2024-07-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/98170

Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include `-lc` and
`-lm` like a standard target does. This removes the old handling that
went through the fat binary `-lcgpu`.


>From 6c6c781a658c4349073a40e0a0ecc10a893a4ca8 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH 1/3] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search 

[clang] [LinkerWrapper] Pass all files to the device linker (PR #97573)

2024-07-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/97573

>From 6c6c781a658c4349073a40e0a0ecc10a893a4ca8 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH 1/2] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [LinkerWrapper] Pass all files to the device linker (PR #97573)

2024-07-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/97573

>From 7a64ee668b33c912f83d4f848ab72d421f8a1bec Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH 1/2] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 00..0a312bdbf3066f
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [OpenMP] Correctly code-gen default atomic mem order (PR #97663)

2024-07-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/97663

Summary:
The parsing for this was implemented, but we never hooked up the default
value to the result of this clause. This patch adds the support by
making it default to the requires directive.


>From fa3561bd4d42522f07ec901c15b411f06b844490 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 3 Jul 2024 21:44:07 -0500
Subject: [PATCH] [OpenMP] Correctly code-gen default atomic mem order

Summary:
The parsing for this was implemented, but we never hooked up the default
value to the result of this clause. This patch adds the support by
making it default to the requires directive.
---
 clang/lib/CodeGen/CGStmtOpenMP.cpp|  2 +-
 .../requires_default_atomic_mem_order.cpp | 46 +++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/OpenMP/requires_default_atomic_mem_order.cpp

diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp 
b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 76ff8f5b234da6..4d05322951d0a5 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -6555,7 +6555,7 @@ static void emitOMPAtomicExpr(CodeGenFunction , 
OpenMPClauseKind Kind,
 }
 
 void CodeGenFunction::EmitOMPAtomicDirective(const OMPAtomicDirective ) {
-  llvm::AtomicOrdering AO = llvm::AtomicOrdering::Monotonic;
+  llvm::AtomicOrdering AO = CGM.getOpenMPRuntime().getDefaultMemoryOrdering();
   // Fail Memory Clause Ordering.
   llvm::AtomicOrdering FailAO = llvm::AtomicOrdering::NotAtomic;
   bool MemOrderingSpecified = false;
diff --git a/clang/test/OpenMP/requires_default_atomic_mem_order.cpp 
b/clang/test/OpenMP/requires_default_atomic_mem_order.cpp
new file mode 100644
index 00..90d2db4eac20c4
--- /dev/null
+++ b/clang/test/OpenMP/requires_default_atomic_mem_order.cpp
@@ -0,0 +1,46 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -emit-llvm -fopenmp -triple=x86_64-unknown-linux-gnu \
+// RUN:   -DORDERING=seq_cst -o - %s \
+// RUN: | FileCheck %s --check-prefix=SEQ_CST
+// RUN: %clang_cc1 -emit-llvm -fopenmp -triple=x86_64-unknown-linux-gnu \
+// RUN:   -DORDERING=acq_rel -o - %s \
+// RUN: | FileCheck %s --check-prefix=ACQ_REL
+// RUN: %clang_cc1 -emit-llvm -fopenmp -triple=x86_64-unknown-linux-gnu \
+// RUN:   -DORDERING=relaxed -o - %s \
+// RUN: | FileCheck %s --check-prefix=RELAXED
+
+#pragma omp requires atomic_default_mem_order(ORDERING)
+
+// SEQ_CST-LABEL: define dso_local void @_Z3fooPi(
+// SEQ_CST-SAME: ptr noundef [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// SEQ_CST-NEXT:  [[ENTRY:.*:]]
+// SEQ_CST-NEXT:[[X_ADDR:%.*]] = alloca ptr, align 8
+// SEQ_CST-NEXT:store ptr [[X]], ptr [[X_ADDR]], align 8
+// SEQ_CST-NEXT:[[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// SEQ_CST-NEXT:[[TMP1:%.*]] = atomicrmw add ptr [[TMP0]], i32 1 seq_cst, 
align 4
+// SEQ_CST-NEXT:call void @__kmpc_flush(ptr @[[GLOB1:[0-9]+]])
+// SEQ_CST-NEXT:ret void
+//
+// ACQ_REL-LABEL: define dso_local void @_Z3fooPi(
+// ACQ_REL-SAME: ptr noundef [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// ACQ_REL-NEXT:  [[ENTRY:.*:]]
+// ACQ_REL-NEXT:[[X_ADDR:%.*]] = alloca ptr, align 8
+// ACQ_REL-NEXT:store ptr [[X]], ptr [[X_ADDR]], align 8
+// ACQ_REL-NEXT:[[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// ACQ_REL-NEXT:[[TMP1:%.*]] = atomicrmw add ptr [[TMP0]], i32 1 release, 
align 4
+// ACQ_REL-NEXT:call void @__kmpc_flush(ptr @[[GLOB1:[0-9]+]])
+// ACQ_REL-NEXT:ret void
+//
+// RELAXED-LABEL: define dso_local void @_Z3fooPi(
+// RELAXED-SAME: ptr noundef [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// RELAXED-NEXT:  [[ENTRY:.*:]]
+// RELAXED-NEXT:[[X_ADDR:%.*]] = alloca ptr, align 8
+// RELAXED-NEXT:store ptr [[X]], ptr [[X_ADDR]], align 8
+// RELAXED-NEXT:[[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// RELAXED-NEXT:[[TMP1:%.*]] = atomicrmw add ptr [[TMP0]], i32 1 
monotonic, align 4
+// RELAXED-NEXT:ret void
+//
+void foo(int *x) {
+  #pragma omp atomic update
+*x = *x + 1;
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libunwind] [libunwind] Remove needless `sys/uio.h` (PR #97495)

2024-07-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/97495
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Pass all files to the device linker (PR #97573)

2024-07-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/97573

Summary:
The linker wrapper's job is to extract embedded device code from fat
binaries and create linked images that can then be embedded and
executed. In order to support LTO, we originally reinvented all of the
LTO handling that `ld.lld` normally does. Primarily, this was because
`nvlink` didn't support this at all, and we have special hacks required
for offloading languages interacting with archive libraries.

Now since I wrote https://github.com/llvm/llvm-project/pull/96561 we
should be able to pass all the inputs to the device linker
transparently. This has the advantage of allowing the `clang` Driver to
do its own handling. Primarily, this will be used to implicitly pass
libraries to the device link job to make it more consistent with other
toolchains.

The JIT support is a notable departure, however there is an option
called `--lto-emit-llvm` that performs the exact function where we want
the final link job to output LLVM-IR that we can then embed instead.

This patch does not fully delete the LTO handling, primarily because I
think the SPIR-V people might want it. To see only the relevant patches,
ignore the first commit of the nvlink-wrapper.

Depends on https://github.com/llvm/llvm-project/pull/96561.


>From 2d3957ac14906d569acf5b3ceb5c7e2f4dfabe54 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH 1/2] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without 

[clang] [flang] [Flang-new][OpenMP] Add bitcode files for AMD GPU OpenMP (PR #96742)

2024-07-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Would it be possible for you to investigate that? It really shouldn't be 
required if we can't help it.

https://github.com/llvm/llvm-project/pull/96742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (PR #96561)

2024-07-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 2d3957ac14906d569acf5b3ceb5c7e2f4dfabe54 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (PR #96561)

2024-07-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 2bb5bd081a29b9bf1c4e6e0f727e21a1b9258920 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 776 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  88 ++
 11 files changed, 1055 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [llvm] [mlir] [OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (PR #80343)

2024-07-02 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

This patch causes the `offloading/bug51781.c` test to fail when compiled with 
reductions + debug information.
```console
> clang ../offload/test/offloading/bug51781.c -fopenmp -O1 --offload-arch=sm_89 
> -DADD_REDUCTION --offload-device-only -gline-tables-only
!dbg attachment points at wrong subprogram for function
!19 = distinct !DISubprogram(name: "__omp_offloading_10302_af88b66_main_l44", 
scope: !11, file: !11, line: 44, type: !20, scopeLine: 44, flags: 
DIFlagArtificial | DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | 
DISPFlagDefinition | DISPFlagOptimized, unit: !10)
ptr @__omp_offloading_10302_af88b66_main_l44
  %16 = load i32, ptr %14, align 4, !dbg !50, !tbaa !35
!50 = !DILocation(line: 44, column: 58, scope: !32)
!32 = distinct !DISubprogram(name: 
"__omp_offloading_10302_af88b66_main_l44_omp_outlined", scope: !11, file: !11, 
line: 44, type: !20, scopeLine: 44, flags: DIFlagArtificial | DIFlagPrototyped, 
spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: 
!10)
!32 = distinct !DISubprogram(name: 
"__omp_offloading_10302_af88b66_main_l44_omp_outlined", scope: !11, file: !11, 
line: 44, type: !20, scopeLine: 44, flags: DIFlagArtificial | DIFlagPrototyped, 
spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: 
!10)
!dbg attachment points at wrong subprogram for function
!19 = distinct !DISubprogram(name: "__omp_offloading_10302_af88b66_main_l44", 
scope: !11, file: !11, line: 44, type: !20, scopeLine: 44, flags: 
DIFlagArtificial | DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | 
DISPFlagDefinition | DISPFlagOptimized, unit: !10)
ptr @__omp_offloading_10302_af88b66_main_l44
  %14 = load i32, ptr %12, align 4, !dbg !50, !tbaa !35
!50 = !DILocation(line: 44, column: 58, scope: !32)
!32 = distinct !DISubprogram(name: 
"__omp_offloading_10302_af88b66_main_l44_omp_outlined", scope: !11, file: !11, 
line: 44, type: !20, scopeLine: 44, flags: DIFlagArtificial | DIFlagPrototyped, 
spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: 
!10)
!32 = distinct !DISubprogram(name: 
"__omp_offloading_10302_af88b66_main_l44_omp_outlined", scope: !11, file: !11, 
line: 44, type: !20, scopeLine: 44, flags: DIFlagArtificial | DIFlagPrototyped, 
spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: 
!10)
```

This test directly uses a reduction, and if I revert this patch it no longer 
breaks. I'm fairly confident that somewhere in this code we did not copy debug 
information correctly. Any clue where that might be?

https://github.com/llvm/llvm-project/pull/80343
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (PR #96561)

2024-07-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 3b10fce6b3d3f8eeb7bd9a3828d488362bb061dd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 754 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  79 ++
 11 files changed, 1024 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-07-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From 8bd49caa9fa93fd3d0812e0a4315f8ff4956056a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH 1/2] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/Basic/Targets/NVPTX.h   |   3 +-
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 916 insertions(+), 31 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index f476d49047c01..e30eaf808ca93 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;
   }
 
   bool isValidCPUName(StringRef Name) const override {
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca 

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-07-01 Thread Joseph Huber via cfe-commits


@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;

jhuber6 wrote:

```suggestion
return TargetInfo::CharPtrBuiltinVaList;
```

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (PR #96561)

2024-07-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 12d00a54169fef15efccfe9472db25b1261d31d3 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 753 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  79 ++
 11 files changed, 1023 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Joseph Huber via cfe-commits


@@ -54,7 +54,34 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter++;
+return T(arg_counter);
+  }
+
+  size_t read_count() const { return arg_counter; }
+};
+
+// Used by the GPU implementation to parse how many bytes need to be read from
+// the variadic argument buffer.
+class DummyArgList {
+  size_t arg_counter = 0;
+
+public:
+  LIBC_INLINE DummyArgList() = default;
+  LIBC_INLINE DummyArgList(va_list) { ; }
+  LIBC_INLINE DummyArgList(DummyArgList ) {
+arg_counter = other.arg_counter;
+  }
+  LIBC_INLINE ~DummyArgList() = default;
+
+  LIBC_INLINE DummyArgList =(DummyArgList ) {
+arg_counter = rhs.arg_counter;
+return *this;
+  }
+
+  template  LIBC_INLINE T next_var() {
+arg_counter =
+((arg_counter + alignof(T) - 1) / alignof(T)) * alignof(T) + sizeof(T);
 return T(arg_counter);

jhuber6 wrote:

Interesting, didn't know about that one. Doesn't seem like GCC supports it, and 
this is one of the files that `gcc` might compile so it's probably easier to 
keep it in a utility for now. Maybe I could do `__has_builtin` for clang.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Lower than native alignment is legal in AMDGPU hardware and it's possible to 
work around in the `printf` implementation, closing.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Joseph Huber via cfe-commits


@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext ) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module ) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module , LLVMContext , IRBuilder<> ,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout , Type *Parameter) override {
+// NVPTX expects natural alignment in all cases. The variadic call ABI will
+// handle promoting types to their appropriate size and alignment.
+const unsigned MinAlign = 1;
+Align A = DL.getABITypeAlign(Parameter);
+if (A < MinAlign)
+  A = Align(MinAlign);
+return {A, false};
+  }

jhuber6 wrote:

I think this was left over, since it's just `1` now I can get rid of it.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > You could theoretically break this if you didn't go through the C ABI and 
> > ignored type promotion, but I'm not concerned with that kind of misuse 
> > since it's against the ABI in the first place.
> 
> The IR has its own ABI that may or may not match whatever the platform "C 
> ABI' is. Especially given the lack of a formal platform ABI specification, I 
> would not characterize not using the C ABI as misuse or against the ABI

The ABI in this case is what NVIDIA does, figuring out whether or not an 
argument came from a struct or was passed directly would be a nightmare of 
metadata nodes so I *really* don't want to go down that path.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Patch should not land. Need to know what bug this was trying to address to 
> guess at what the right fix would be.

My understanding was that the variadics did lowering to a struct with a minimum 
alignment of four. This currently *doesn't* do that, hence my confusion. The 
current lowering provides no padding, which I now see is a deliberate choice to 
save on stack presumably. The issue I had was that I laid out my `printf` code 
with the assumption that the buffer was a struct, so now it won't work when 
copied to another target via RPC because it's not the correct alignment.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> The nvptx lowering looks dubious - values smaller than slot size should be 
> passed with the same alignment as the slot and presently aren't. A struct 
> containing i8, i16 or half should be miscompiled on nvptx as written.

I mentioned this in the original patch, it's correct as far as I know. NVPTX 
does not require nested structs to have slot alignment, which means that the 
minimum alignment is exactly the type. The C ABI helps us here by making 
arguments passed directly all get type promoted to `i32`. You could 
theoretically break this if you didn't go through the C ABI and ignored type 
promotion, but I'm not concerned with that kind of misuse since it's against 
the ABI in the first place.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Joseph Huber via cfe-commits


@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext ) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module ) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module , LLVMContext , IRBuilder<> ,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout , Type *Parameter) override {
+// NVPTX expects natural alignment in all cases. The variadic call ABI will
+// handle promoting types to their appropriate size and alignment.
+const unsigned MinAlign = 1;
+Align A = DL.getABITypeAlign(Parameter);

jhuber6 wrote:

I don't think so, NVPTX uses structs with normal padding, see 
https://godbolt.org/z/1v54YY6d3.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] Revert: [clangd] Replace an include with a forward declaration (PR #97082)

2024-06-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

Seems reasonable as I believe there were extra uses that needed the size.

https://github.com/llvm/llvm-project/pull/97082
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP][offload] Fix dynamic schedule tracking (PR #97065)

2024-06-28 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Malloc cannot be helped here if we want to have correctness. Currently it is 
> just broken and not even runnable.

I figured that all this code would go away if we just made all schedules static.

https://github.com/llvm/llvm-project/pull/97065
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP][offload] Fix dynamic schedule tracking (PR #97065)

2024-06-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

Could you provide a more descriptive summary?

I thought we discussed that the dynamic support would just use the static 
scheduler, but this seems to implement it? I personally don't want to see more 
things in the OpenMP runtime relying on `malloc` if we can avoid it.

https://github.com/llvm/llvm-project/pull/97065
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CUDA][NFC] CudaArch to OffloadArch rename (PR #97028)

2024-06-28 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

This is definitely overdue, thanks. 

https://github.com/llvm/llvm-project/pull/97028
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-27 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Re-did it and tested it against `libc` in 
https://github.com/llvm/llvm-project/pull/96972 so it will have a CI running it 
one  that lands. it works for other cases I've tested, but let me know if 
something else should be added.

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 849c8dab14c9332081a8c6331c9ca0c234793393 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 753 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  79 ++
 11 files changed, 1023 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 00..0a312bdbf3066f
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [AMDGPU][OpenMP] Do not attach -fcuda-is-device flag for AMDGPU OpenMP (PR #96909)

2024-06-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

We don't even pass this in the NVPTX offloading case, so there's no reason to 
do it for AMDGPU.

https://github.com/llvm/llvm-project/pull/96909
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [libc] Remove atomic alignment diagnostics globally (PR #96803)

2024-06-26 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/96803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [libc] Remove atomic alignment diagnostics globally (PR #96803)

2024-06-26 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96803

>From 66b82f970e8914a920259dd12decd65fbb325356 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 26 Jun 2024 12:58:22 -0500
Subject: [PATCH] [libc] Remove atomic alignment diagnostics globally

Summary:
These warnings mean that it will lower to a libcall. Previously we just
disabled it locally, which didn't work with GCC. This patch does it
globally in the compiler options if the compiler is clang.
---
 clang/cmake/caches/Fuchsia-stage2.cmake | 2 +-
 libc/src/stdlib/rand.cpp| 6 --
 libc/src/stdlib/srand.cpp   | 6 --
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/clang/cmake/caches/Fuchsia-stage2.cmake 
b/clang/cmake/caches/Fuchsia-stage2.cmake
index a573ec5473210..9892b5d58e719 100644
--- a/clang/cmake/caches/Fuchsia-stage2.cmake
+++ b/clang/cmake/caches/Fuchsia-stage2.cmake
@@ -321,7 +321,7 @@ foreach(target 
armv6m-unknown-eabi;armv7m-unknown-eabi;armv8m-unknown-eabi)
   set(RUNTIMES_${target}_CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING "")
   set(RUNTIMES_${target}_CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY CACHE 
STRING "")
   foreach(lang C;CXX;ASM)
-set(RUNTIMES_${target}_CMAKE_${lang}_FLAGS "--target=${target} -mthumb" 
CACHE STRING "")
+set(RUNTIMES_${target}_CMAKE_${lang}_FLAGS "--target=${target} -mthumb 
-Wno-atomic-alignment" CACHE STRING "")
   endforeach()
   foreach(type SHARED;MODULE;EXE)
 set(RUNTIMES_${target}_CMAKE_${type}_LINKER_FLAGS "-fuse-ld=lld" CACHE 
STRING "")
diff --git a/libc/src/stdlib/rand.cpp b/libc/src/stdlib/rand.cpp
index 8f2ae90336d51..ff3875c2f6959 100644
--- a/libc/src/stdlib/rand.cpp
+++ b/libc/src/stdlib/rand.cpp
@@ -13,10 +13,6 @@
 
 namespace LIBC_NAMESPACE {
 
-// Silence warnings on targets with slow atomics.
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Watomic-alignment"
-
 // An implementation of the xorshift64star pseudo random number generator. This
 // is a good general purpose generator for most non-cryptographics 
applications.
 LLVM_LIBC_FUNCTION(int, rand, (void)) {
@@ -33,6 +29,4 @@ LLVM_LIBC_FUNCTION(int, rand, (void)) {
   }
 }
 
-#pragma GCC diagnostic pop
-
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/stdlib/srand.cpp b/libc/src/stdlib/srand.cpp
index 681aad8fac4e8..21166c7a6754e 100644
--- a/libc/src/stdlib/srand.cpp
+++ b/libc/src/stdlib/srand.cpp
@@ -12,14 +12,8 @@
 
 namespace LIBC_NAMESPACE {
 
-// Silence warnings on targets with slow atomics.
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Watomic-alignment"
-
 LLVM_LIBC_FUNCTION(void, srand, (unsigned int seed)) {
   rand_next.store(seed, cpp::MemoryOrder::RELAXED);
 }
 
-#pragma GCC diagnostic pop
-
 } // namespace LIBC_NAMESPACE

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-06-26 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

Looks fine in general, I'm not a huge fan of all the `isGPUProfTarget` things 
we have around now, but I understand it's required to set up the visibility. I 
wonder if we could factor that out into something more common.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-26 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32

jhuber6 wrote:

So, it seems that Jon's original patch did that on purpose to try to save 
space? But it seems weird since it breaks struct padding / alignment. That's 
from a separate patch I just included here to make it easier. The lack of 
stacked PRs is annoying. See https://github.com/llvm/llvm-project/pull/96370

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [Flang-new][OpenMP] Add offload related flags for AMDGPU (PR #96742)

2024-06-26 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

The fact that it's called `-fcuda-is-device` is historical cruft, but I guess 
it's easiest to just work with it. I also hate `-mlink-builtin-bitcode` as a 
concept, but we're not quite ready to move away from its hacks unfortunately. 

https://github.com/llvm/llvm-project/pull/96742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

I think this looks good overall, though I'd like to hear some other clang 
maintainers chime in on the LIT config changes.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,77 @@
+; Check various clang-linker-wrapper pass options after -offload-opt.

jhuber6 wrote:

I see, probably fine then.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,77 @@
+; Check various clang-linker-wrapper pass options after -offload-opt.

jhuber6 wrote:

-disable-O0-optnone handles the optnone, don't think `noinline` affects that 
much.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,77 @@
+; Check various clang-linker-wrapper pass options after -offload-opt.

jhuber6 wrote:

Hm, is this really the only LLVM-IR file in the Driver directory? I guess it 
makes sense, though you could probably just do what the other linker wrapper 
tests do and use `clang-cc1` to directly get some random IR to use.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,86 @@
+; Check various clang-linker-wrapper pass options after -offload-opt.
+
+; REQUIRES: llvm-plugins, llvm-examples
+; REQUIRES: x86-registered-target
+; REQUIRES: amdgpu-registered-target
+
+; Setup.
+; RUN: split-file %s %t
+; RUN: opt -o %t/host-x86_64-unknown-linux-gnu.bc \
+; RUN: %t/host-x86_64-unknown-linux-gnu.ll
+; RUN: opt -o %t/openmp-amdgcn-amd-amdhsa.bc \
+; RUN: %t/openmp-amdgcn-amd-amdhsa.ll
+; RUN: clang-offload-packager -o %t/openmp-x86_64-unknown-linux-gnu.out \
+; RUN: --image=file=%t/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
+; RUN: %clang -cc1 -S -o %t/host-x86_64-unknown-linux-gnu.s \
+; RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+; RUN: -fembed-offload-object=%t/openmp-x86_64-unknown-linux-gnu.out \
+; RUN: %t/host-x86_64-unknown-linux-gnu.bc
+; RUN: %clang -cc1as -o %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64 \
+; RUN: %t/host-x86_64-unknown-linux-gnu.s
+
+; Check plugin, -passes, and no remarks.
+; RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+; RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
+; RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
+
+; Check plugin, -p, and remarks.
+; RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+; RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
+; RUN: --offload-opt=-p="function(goodbye),module(inline)" \
+; RUN: --offload-opt=-pass-remarks=inline \
+; RUN: --offload-opt=-pass-remarks-output=%t/remarks.yml \
+; RUN: --offload-opt=-pass-remarks-filter=inline \
+; RUN: --offload-opt=-pass-remarks-format=yaml 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=OUT,REM %s
+; RUN: FileCheck -input-file=%t/remarks.yml -match-full-lines \
+; RUN: -check-prefixes=YML %s
+
+; Check handling of bad plugin.
+; RUN: not clang-linker-wrapper \
+; RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
+
+;  OUT-NOT: {{.}}
+;  OUT: Bye: f
+; OUT-NEXT: Bye: test
+; REM-NEXT: remark: {{.*}} 'f' inlined into 'test' {{.*}}
+;  OUT-NOT: {{.}}
+
+;  YML-NOT: {{.}}
+;  YML: --- !Passed
+; YML-NEXT: Pass: inline
+; YML-NEXT: Name: Inlined
+; YML-NEXT: Function: test
+; YML-NEXT: Args:
+;  YML:  - Callee: f
+;  YML:  - Caller: test
+;  YML: ...
+;  YML-NOT: {{.}}
+
+; BAD-PLUGIN-NOT: {{.}}
+; BAD-PLUGIN: {{.*}}Could not load library {{.*}}nonexistent.so{{.*}}
+; BAD-PLUGIN-NOT: {{.}}
+
+;--- host-x86_64-unknown-linux-gnu.ll
+target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+;--- openmp-amdgcn-amd-amdhsa.ll
+target datalayout = 
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+target triple = "amdgcn-amd-amdhsa"
+
+define void @f() {
+entry:
+  ret void
+}
+
+define amdgpu_kernel void @test() {

jhuber6 wrote:

I mean, we have this kernel-split thing when we could just have some generic 
LLVM-IR and set `-mtriple` in `opt` to just get the triple and default data 
layout instead.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,10 @@
+// Check that these simple command lines for listing LLVM options are 
supported,

jhuber6 wrote:

Do we have any other tests that just check the output for `--help`? Might be a 
little excessive.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

Makes sense overall. However in the future I'm looking to move away from the 
home-baked LTO pipeline in favor of giving it to the linker. That allows me to 
set up libraries as a part of the target toolchain in the driver. I guess for 
that I'll just need to forward `-mllvm` to the internal clang invocation.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Extend with usual pass options (PR #96704)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,86 @@
+; Check various clang-linker-wrapper pass options after -offload-opt.
+
+; REQUIRES: llvm-plugins, llvm-examples
+; REQUIRES: x86-registered-target
+; REQUIRES: amdgpu-registered-target
+
+; Setup.
+; RUN: split-file %s %t
+; RUN: opt -o %t/host-x86_64-unknown-linux-gnu.bc \
+; RUN: %t/host-x86_64-unknown-linux-gnu.ll
+; RUN: opt -o %t/openmp-amdgcn-amd-amdhsa.bc \
+; RUN: %t/openmp-amdgcn-amd-amdhsa.ll
+; RUN: clang-offload-packager -o %t/openmp-x86_64-unknown-linux-gnu.out \
+; RUN: --image=file=%t/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
+; RUN: %clang -cc1 -S -o %t/host-x86_64-unknown-linux-gnu.s \
+; RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+; RUN: -fembed-offload-object=%t/openmp-x86_64-unknown-linux-gnu.out \
+; RUN: %t/host-x86_64-unknown-linux-gnu.bc
+; RUN: %clang -cc1as -o %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64 \
+; RUN: %t/host-x86_64-unknown-linux-gnu.s
+
+; Check plugin, -passes, and no remarks.
+; RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+; RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
+; RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
+
+; Check plugin, -p, and remarks.
+; RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+; RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
+; RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
+; RUN: --offload-opt=-p="function(goodbye),module(inline)" \
+; RUN: --offload-opt=-pass-remarks=inline \
+; RUN: --offload-opt=-pass-remarks-output=%t/remarks.yml \
+; RUN: --offload-opt=-pass-remarks-filter=inline \
+; RUN: --offload-opt=-pass-remarks-format=yaml 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=OUT,REM %s
+; RUN: FileCheck -input-file=%t/remarks.yml -match-full-lines \
+; RUN: -check-prefixes=YML %s
+
+; Check handling of bad plugin.
+; RUN: not clang-linker-wrapper \
+; RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
+; RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
+
+;  OUT-NOT: {{.}}
+;  OUT: Bye: f
+; OUT-NEXT: Bye: test
+; REM-NEXT: remark: {{.*}} 'f' inlined into 'test' {{.*}}
+;  OUT-NOT: {{.}}
+
+;  YML-NOT: {{.}}
+;  YML: --- !Passed
+; YML-NEXT: Pass: inline
+; YML-NEXT: Name: Inlined
+; YML-NEXT: Function: test
+; YML-NEXT: Args:
+;  YML:  - Callee: f
+;  YML:  - Caller: test
+;  YML: ...
+;  YML-NOT: {{.}}
+
+; BAD-PLUGIN-NOT: {{.}}
+; BAD-PLUGIN: {{.*}}Could not load library {{.*}}nonexistent.so{{.*}}
+; BAD-PLUGIN-NOT: {{.}}
+
+;--- host-x86_64-unknown-linux-gnu.ll
+target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+;--- openmp-amdgcn-amd-amdhsa.ll
+target datalayout = 
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+target triple = "amdgcn-amd-amdhsa"
+
+define void @f() {
+entry:
+  ret void
+}
+
+define amdgpu_kernel void @test() {

jhuber6 wrote:

Is the kernel really necessary? Otherwise I'd just compile with 
`-mtriple=amdgcn--` or something.

https://github.com/llvm/llvm-project/pull/96704
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -54,7 +54,8 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter =

jhuber6 wrote:

Done

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper][NFC] Simplify StringErrors (PR #96650)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/96650
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -54,7 +54,8 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter =

jhuber6 wrote:

I didn't see any tests that actively depended on this value, and figured that 
it does a similar job stating how many bytes were read, but I can make a new 
one if needed. Figured this was an easier diff since it's only one line.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,73 @@
+//===--- GPU helper functions for printf using RPC 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/arg_list.h"
+#include "src/stdio/gpu/file.h"
+#include "src/string/string_utils.h"
+
+#include 
+
+namespace LIBC_NAMESPACE {
+namespace file {

jhuber6 wrote:

Figured it's included in `gpu/`, but I can make it more explicitly GPU related, 
though I think even a namespace is probably unnecessary.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -54,7 +54,8 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter =

jhuber6 wrote:

I now use the `MockArgList` to tell determine how big the "struct" needs to be 
to contain the arguments.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I looked at this and it doesn't look super appealing (turned into something 
> of a rabbit hole), it'd duplicate a lot of the existing toolchain, and would 
> also try to squat in an already overcrowded space (there's already HIPSPV).

We already have a SPIR-V toolchain for HIP? Seems like it was added over two 
years ago. What does the other handling do that this toolchain doesn't?

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Also, I just merged the prerequisite patches into this, to get the relevant 
changed just look at the most recent commit. The lack of stacked PRs in GitHub 
really irks me.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Do we want some sort of optimization for constant printf? 99% of the time, we 
> could parse the string at compile-time. (This sort of optimization is common 
> for embedded targets.)

I was going to make a follow-up patch that simply skipped sending back the size 
if there were no arguments to parse. If we enable these builtins as available 
on the GPU (Which I may very soon) we will also get `printf -> puts` 
optimizations. There's no passes that optimize things like `printf('%d", 10)` 
to `puts("10")` as far as I know.

> If the format string isn't constant, is parsing the string on the GPU really 
> slower than asking the host for the size? printf format strings aren't that 
> complicated, especially if you aren't actually doing the formatting.

Well, this approach basically trades speed for resource usage. I had an old 
implementation that did the parsing on the GPU side, 
(https://reviews.llvm.org/D158774), and that had an unfortunate amount of 
registers used when the arguments are truly just an array in this sense. Plus, 
since I wrote that there's been a lot more added to the format parsing, since I 
think future C releases are supposed to support 128 bit integers or something.

I think my old version used something like 54 SGPRs and 40 VGPRs while this 
version is
```
printf.c:4:0: Function Name: foo

  
printf.c:4:0: SGPRs: 36
printf.c:4:0: VGPRs: 19
```
 
> Does this support `%n`?

No, I specifically disabled it in the `printf` config. The fact that it writes 
back a pointer made it too annoying to implement, and I think in general people 
consider `%n` a security issue so it's probably not a huge loss.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > I'll need to play with this with my driver code. I'm guessing it's because 
> > it needs to generate an entirely separate toolchain? The OpenMP path 
> > basically does that by inferring the toolchain from the string value, so we 
> > can support `--offload-arch=sm_89,gfx90a` for example.
> 
> Not quite, it's more because we'd have to nest two triples 
> (`spirv64-amd-amdhsa` && `amdgcn-amd-amdhsa`) within the same toolchain, 
> since we're using the same HIPAMD ToolChain. It's fixable, just slightly 
> faffy to do without spamming toolchains / within the same toolchain.

Honestly I'm wondering if it would be cleaner to just make a 
`HIPSPIRVToolChain` since we may want special handling in the future.

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-25 Thread Joseph Huber via cfe-commits


@@ -284,3 +284,48 @@ Example Usage
   Base* basePtr = 
   basePtr->virtualFunction(); // Allowed since obj is constructed in 
device code
}
+
+SPIR-V Support on HIPAMD ToolChain
+==
+
+The HIPAMD ToolChain supports targetting
+`AMDGCN Flavoured SPIR-V 
`_.
+The support for SPIR-V in the ROCm and HIPAMD ToolChain is under active
+development.
+
+Compilation Process
+---
+
+When compiling HIP programs with the intent of utilizing SPIR-V, the process
+diverges from the traditional compilation flow:
+
+Using ``--offload-arch=amdgcnspirv``
+
+
+- **Target Triple**: The ``--offload-arch=amdgcnspirv`` flag instructs the
+  compiler to use the target triple ``spirv64-amd-amdhsa``. This approach does
+  generates generic AMDGCN SPIR-V which retains architecture specific elements
+  without hardcoding them, thus allowing for optimal target specific code to be
+  generated at run time, when the concrete target is known.
+
+- **LLVM IR Translation**: The program is compiled to LLVM Intermediate
+  Representation (IR), which is subsequently translated into SPIR-V. In the
+  future, this translation step will be replaced by direct SPIR-V emission via
+  the SPIR-V Back-end.
+
+- **Clang Offload Bundler**: The resulting SPIR-V is embedded in the Clang
+  offload bundler with the bundle ID ``hipv4-hip-spirv64-amd-amdhsa-generic``.
+
+Mixed with Normal ``--offload-arch``
+
+
+**Mixing ``amdgcnspirv`` and concrete ``gfx###`` targets via ``--offload-arch``

jhuber6 wrote:

I'll need to play with this with my driver code. I'm guessing it's because it 
needs to generate an entirely separate toolchain? The OpenMP path basically 
does that by inferring the toolchain from the string value, so we can support 
`--offload-arch=sm_89,gfx90a` for example.

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][docs] Add preliminary documentation for SPIR-V support in the HIPAMD ToolChain (PR #96657)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/96657
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper][NFC] Simplify StringErrors (PR #96650)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96650

Summary:
The StringError class has a specialized method that creates the
inconvertible error code for you. It's much easier to read this way.


>From 6860d0101f8babac086156087854e8f94e4f233e Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 25 Jun 2024 10:00:37 -0500
Subject: [PATCH] [LinkerWrapper][NFC] Simplify StringErrors

Summary:
The StringError class has a specialized method that creates the
inconvertible error code for you. It's much easier to read this way.
---
 .../ClangLinkerWrapper.cpp| 53 ---
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index cdfe8cfbd9379..9027076119cf9 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -224,9 +224,8 @@ Error executeCommands(StringRef ExecutablePath, 
ArrayRef Args) {
 
   if (!DryRun)
 if (sys::ExecuteAndWait(ExecutablePath, Args))
-  return createStringError(inconvertibleErrorCode(),
-   "'" + sys::path::filename(ExecutablePath) + "'" 
+
-   " failed");
+  return createStringError(
+  "'%s' failed", sys::path::filename(ExecutablePath).str().c_str());
   return Error::success();
 }
 
@@ -259,7 +258,6 @@ Error relocateOffloadSection(const ArgList , StringRef 
Output) {
   Args.getLastArgValue(OPT_host_triple_EQ, sys::getDefaultTargetTriple()));
   if (Triple.isOSWindows())
 return createStringError(
-inconvertibleErrorCode(),
 "Relocatable linking is not supported on COFF targets");
 
   Expected ObjcopyPath =
@@ -272,8 +270,7 @@ Error relocateOffloadSection(const ArgList , StringRef 
Output) {
   auto BufferOrErr = DryRun ? MemoryBuffer::getMemBuffer("")
 : MemoryBuffer::getFileOrSTDIN(Output);
   if (!BufferOrErr)
-return createStringError(inconvertibleErrorCode(), "Failed to open %s",
- Output.str().c_str());
+return createStringError("Failed to open %s", Output.str().c_str());
   std::string Suffix = "_" + getHash((*BufferOrErr)->getBuffer());
 
   SmallVector ObjcopyArgs = {
@@ -492,8 +489,7 @@ Expected clang(ArrayRef InputFiles, 
const ArgList ) {
 
 file_magic Magic;
 if (auto EC = identify_magic(Arg->getValue(), Magic))
-  return createStringError(inconvertibleErrorCode(),
-   "Failed to open %s", Arg->getValue());
+  return createStringError("Failed to open %s", Arg->getValue());
 if (Magic != file_magic::archive &&
 Magic != file_magic::elf_shared_object)
   continue;
@@ -568,9 +564,8 @@ Expected linkDevice(ArrayRef 
InputFiles,
   case Triple::systemz:
 return generic::clang(InputFiles, Args);
   default:
-return createStringError(inconvertibleErrorCode(),
- Triple.getArchName() +
- " linking is not supported");
+return createStringError(Triple.getArchName() +
+ " linking is not supported");
   }
 }
 
@@ -881,15 +876,13 @@ Error linkBitcodeFiles(SmallVectorImpl 
,
 return Err;
 
   if (LTOError)
-return createStringError(inconvertibleErrorCode(),
- "Errors encountered inside the LTO pipeline.");
+return createStringError("Errors encountered inside the LTO pipeline.");
 
   // If we are embedding bitcode we only need the intermediate output.
   bool SingleOutput = Files.size() == 1;
   if (Args.hasArg(OPT_embed_bitcode)) {
 if (BitcodeOutput.size() != 1 || !SingleOutput)
-  return createStringError(inconvertibleErrorCode(),
-   "Cannot embed bitcode with multiple files.");
+  return createStringError("Cannot embed bitcode with multiple files.");
 OutputFiles.push_back(Args.MakeArgString(BitcodeOutput.front()));
 return Error::success();
   }
@@ -936,7 +929,7 @@ Expected compileModule(Module , OffloadKind 
Kind) {
   std::string Msg;
   const Target *T = TargetRegistry::lookupTarget(M.getTargetTriple(), Msg);
   if (!T)
-return createStringError(inconvertibleErrorCode(), Msg);
+return createStringError(Msg);
 
   auto Options =
   codegen::InitTargetOptionsFromCodeGenFlags(Triple(M.getTargetTriple()));
@@ -966,8 +959,7 @@ Expected compileModule(Module , OffloadKind 
Kind) {
   CodeGenPasses.add(new TargetLibraryInfoWrapperPass(TLII));
   if (TM->addPassesToEmitFile(CodeGenPasses, *OS, nullptr,
   CodeGenFileType::ObjectFile))
-return createStringError(inconvertibleErrorCode(),
- "Failed to execute host backend");
+return createStringError("Failed to execute host backend");
   

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 859f6a7fce9503275ad7eb39512dc5833a11bb07 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 9 files changed, 865 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

@MaskRay So, I think my symbol resolution is (unsurprisingly) subtly broken. Is 
there a canonical way to handle this? I first thought that we could simply 
perform the symbol resolutions as normal for every file, but keep track of 
which symbols were "lazy". However, I couldn't figure out how to then tell if a 
lazy symbol should be extracted or not because there's no information on which 
files use which symbols. Maybe I just scan all the files and see if they 
reference a symbol that's marked defined and lazy?

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 6c70e542bbb355160b833ede6f86be0366953b88 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 9 files changed, 865 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode 

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Hm, that's what I'm doing in the `printf` implementation and it doesn't 
> > work without that patch. When I look at the varargs struct it didn't have 
> > any padding, which explained why `alignTo(ptr + size, align)` was wrong. 
> > So, I was trying to do the following, `printf("%d%ld", 1, 1l)`. With this 
> > patch I get the following,
> 
> For what IR? Is the small struct getting expanded into individual scalar 
> pieces?

The implementation intentionally states that the alignment is always `4` in 
this case, which results in there being no padding between the four byte and 
eight byte values when put into a `void *` buffer for varargs. You should be 
able to see it in the changes to the tests. Unfortunately, @JonChesterfield is 
on vacation so I can't ask about this as it seems to be a deliberate choice, 
but I don't know how it's correct.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > > Incrementing by align is just a bug, of course the size is the real 
> > > value. Whether we want to continue wasting space is another 
> > > not-correctness discussion
> > 
> > 
> > Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
> > wouldn't mind it so much if it didn't require me to know which vendor I was 
> > dealing with in the RPC implementation, but I suppose I could store that 
> > information somewhere if we want to use a compressed option and we know it 
> > works.
> 
> It's not about struct padding, but the base alignment. Any pointer increment 
> should be alignTo(ptr + size, align), not += align. The += align won't even 
> work for large structs

Hm, that's what I'm doing in the `printf` implementation and it doesn't work 
without that patch. When I look at the varargs struct it didn't have any 
padding, which explained why `alignTo(ptr + size, align)` was wrong. So, I was 
trying to do the following, `printf("%d%ld", 1, 1l)`. With this patch I get the 
following,
```
0xbebebebe0001
0x0001
```
Without this patch, I get this. As you can see there's no struct padding so the 
8 byte value is right next to the 4 byte one.
```
0x00010001
0xbebebebe
```

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> @Artem-B asked me to review nvptx patches while he's OOO, but this one is 
> pretty far outside my depth. Are you OK waiting until he's back? I don't know 
> exactly when that will be, but based on his IMs to me, he should be back 
> early July.

No problem, I knew that it would probably take awhile to get reviewed given the 
size. I believe he said he'd be back early July as well, so maybe next week? 
It'd probably require his input, along with some of the other interested 
parties in clang to see how they feel about reviving one of these old tools.

(However if you know anything about the NVPTX varargs API I think 
https://github.com/llvm/llvm-project/pull/96015 is mostly just waiting for 
someone to say that it's a mostly correct lowering)

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 5edeeb9816fa5909f27a781f6e7213dd02ccdfa0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 864 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 8a52becd358abb2c96ca150db501d58c40b5250b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 864 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96561

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.


>From d48deace957dfd2f1abaf232c1462a7725f7f1ee Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  64 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 863 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we 

[clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268)

2024-06-24 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> @jhuber6 The clang format errors are mostly due to my local version of 
> `clang-format` disagreeing with the buildbot's version. Its a bit annoying, 
> but it shouldn't be too much of a problem given I plan on squashing and 
> merging once this gets approved.
> 
> I added new flags for GPU PGO specifically because I didn't want to modify 
> the PGO flags' existing behavior. PGO has a significant runtime cost, so I 
> figured it would be best for the end user experience to only enable PGO on 
> the GPU when it was specifically requested.

Is this something that specifically requires its own flag? Or could we just do 
`-Xarch_device -fprofile-generate`.

https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [llvm] [openmp] [PGO][Offload] Add GPU profiling flags to driver (PR #94268)

2024-06-23 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

Seems to be lots of accidental `clang-format` changes. Why do we need new flags 
for this instead of just using the old ones and changing behavior when the 
target is a known GPU? I.e. SPIR-V, CUDA, or HSA.

https://github.com/llvm/llvm-project/pull/94268
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Incrementing by align is just a bug, of course the size is the real value. 
> Whether we want to continue wasting space is another not-correctness 
> discussion

Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
wouldn't mind it so much if it didn't require me to know which vendor I was 
dealing with in the RPC implementation, but I suppose I could store that 
information somewhere if we want to use a compressed option and we know it 
works.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Here, because the minimum alignment is 4, we will only increment the
> > buffer by 4,
> 
> It should be incrementing by the size? 4 byte aligned access of 8 byte type 
> should work fine

Guess that's an AMD thing, so I'm going to assume that @JonChesterfield wrote 
this intentionally to save on stack space? I suppose the issue I'm having with 
my `printf` implementation is that we then want to copy this struct and because 
it doesn't follow natural alignment the person printing it doesn't know where 
these are stored in a common sense. I suppose I could change the code to just 
be `ptr += sizeof(T)` instead of doing the alignment, but I feel like some 
architectures require strict alignment for these and it wouldn't work in the 
general case.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

Out of curiosity, how badly does this fail when you use `--offload-new-driver` 
w/ HIP? I swear I'll get that passing the internal test suite eventually, 
there's a single case for emitting IR that comgr uses that I can't seem to fix.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96370

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.


>From 5ee5bccb5dd4bd1d78dc04ead3c334d88b86f4fd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 21 Jun 2024 19:17:42 -0500
Subject: [PATCH] [LLVM] Fix incorrect alignment on AMDGPU variadics

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.
---
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  11 +-
 clang/test/CodeGen/amdgpu-variadic-call.c |  32 +-
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |   6 +-
 .../CodeGen/AMDGPU/expand-variadic-call.ll| 574 +-
 4 files changed, 316 insertions(+), 307 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 4d3275e17c386..a169a7d920456 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const {
 RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
 QualType Ty, AggValueSlot Slot) const {
   const bool IsIndirect = false;
-  const bool AllowHigherAlign = false;
+  const bool AllowHigherAlign = true;
   return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
   getContext().getTypeInfoInChars(Ty),
   CharUnits::fromQuantity(4), AllowHigherAlign, Slot);
@@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType 
Ty, bool Variadic,
 
   Ty = useFirstFieldIfTransparentUnion(Ty);
 
-  if (Variadic) {
-return ABIArgInfo::getDirect(/*T=*/nullptr,
- /*Offset=*/0,
- /*Padding=*/nullptr,
- /*CanBeFlattened=*/false,
- /*Align=*/0);
-  }
+  if (Variadic)
+return ABIArgInfo::getDirect();
 
   if (isAggregateTypeForABI(Ty)) {
 // Records with non-trivial destructors/copy-constructors should not be
diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c 
b/clang/test/CodeGen/amdgpu-variadic-call.c
index 17eda215211a2..0529d6b3171c8 100644
--- a/clang/test/CodeGen/amdgpu-variadic-call.c
+++ b/clang/test/CodeGen/amdgpu-variadic-call.c
@@ -1,4 +1,3 @@
-// REQUIRES: amdgpu-registered-target
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
 // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s 
-o - | FileCheck %s
 
@@ -179,11 +178,9 @@ typedef struct
 // CHECK-LABEL: define {{[^@]+}}@one_pair_f64
 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double 
[[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue 
[[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] 
[[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1
-// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] 
[[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], 
[[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double 
[[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double 
[[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) 
#[[ATTR2]]
 // CHECK-NEXT:ret void
 //
 void one_pair_f64(int f0, double f1, pair_f64 v0)
@@ -220,10 +217,9 @@ typedef union
 // CHECK-SAME: (i32 noundef 

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96369

Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.


>From 42a7a45c845de377b9b714af39a449fdc49eb768 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 21 Jun 2024 19:10:40 -0500
Subject: [PATCH] [libc] Implement (v|f)printf on the GPU

Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.

Depends on https://github.com/llvm/llvm-project/pull/96015.
---
 .../ClangLinkerWrapper.cpp|  1 +
 libc/config/gpu/entrypoints.txt   | 19 ++---
 libc/src/__support/arg_list.h |  3 +-
 libc/src/gpu/rpc_fprintf.cpp  |  5 +-
 libc/src/stdio/CMakeLists.txt | 24 +-
 libc/src/stdio/generic/CMakeLists.txt | 25 +++
 libc/src/stdio/{ => generic}/fprintf.cpp  |  0
 libc/src/stdio/{ => generic}/vfprintf.cpp |  0
 libc/src/stdio/gpu/CMakeLists.txt | 48 
 libc/src/stdio/gpu/fprintf.cpp| 32 
 libc/src/stdio/gpu/printf.cpp | 30 
 libc/src/stdio/gpu/vfprintf.cpp   | 29 
 libc/src/stdio/gpu/vfprintf_utils.h   | 73 +++
 libc/src/stdio/gpu/vprintf.cpp| 28 +++
 .../integration/src/stdio/gpu/CMakeLists.txt  |  2 +-
 .../test/integration/src/stdio/gpu/printf.cpp | 43 ---
 libc/utils/gpu/server/rpc_server.cpp  | 24 +-
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  8 +-
 18 files changed, 326 insertions(+), 68 deletions(-)
 rename libc/src/stdio/{ => generic}/fprintf.cpp (100%)
 rename libc/src/stdio/{ => generic}/vfprintf.cpp (100%)
 create mode 100644 libc/src/stdio/gpu/fprintf.cpp
 create mode 100644 libc/src/stdio/gpu/printf.cpp
 create mode 100644 libc/src/stdio/gpu/vfprintf.cpp
 create mode 100644 libc/src/stdio/gpu/vfprintf_utils.h
 create mode 100644 libc/src/stdio/gpu/vprintf.cpp

diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index cdfe8cfbd9379..03fd23ae39c29 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1671,6 +1671,7 @@ int main(int Argc, char **Argv) {
 NewArgv.push_back(Arg->getValue());
   for (const opt::Arg *Arg : Args.filtered(OPT_offload_opt_eq_minus))
 NewArgv.push_back(Args.MakeArgString(StringRef("-") + Arg->getValue()));
+  llvm::errs() << "asdfasdf\n";
   cl::ParseCommandLineOptions(NewArgv.size(), [0]);
 
   Verbose = Args.hasArg(OPT_verbose);
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 2217a696fc5d1..de1ca6bfd151f 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -1,13 +1,3 @@
-if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)
-  set(extra_entrypoints
-  # stdio.h entrypoints
-  libc.src.stdio.sprintf
-  libc.src.stdio.snprintf
-  libc.src.stdio.vsprintf
-  libc.src.stdio.vsnprintf
-  )
-endif()
-
 set(TARGET_LIBC_ENTRYPOINTS
 # assert.h entrypoints
 libc.src.assert.__assert_fail
@@ -185,7 +175,14 @@ set(TARGET_LIBC_ENTRYPOINTS
 libc.src.errno.errno
 
 # stdio.h entrypoints
-${extra_entrypoints}
+libc.src.stdio.printf
+libc.src.stdio.vprintf
+libc.src.stdio.fprintf
+libc.src.stdio.vfprintf
+libc.src.stdio.sprintf
+libc.src.stdio.snprintf
+libc.src.stdio.vsprintf
+libc.src.stdio.vsnprintf
 libc.src.stdio.feof
 libc.src.stdio.ferror
 libc.src.stdio.fseek
diff --git a/libc/src/__support/arg_list.h b/libc/src/__support/arg_list.h
index 0965e12afd562..3a4e5ad0fab3c 100644
--- a/libc/src/__support/arg_list.h
+++ b/libc/src/__support/arg_list.h
@@ -54,7 +54,8 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter =
+((arg_counter + alignof(T) - 1) / alignof(T)) * alignof(T) + sizeof(T);
 return T(arg_counter);
   }
 
diff --git a/libc/src/gpu/rpc_fprintf.cpp b/libc/src/gpu/rpc_fprintf.cpp
index 

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From 8bd49caa9fa93fd3d0812e0a4315f8ff4956056a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/Basic/Targets/NVPTX.h   |   3 +-
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 916 insertions(+), 31 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index f476d49047c01..e30eaf808ca93 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;
   }
 
   bool isValidCPUName(StringRef Name) const override {
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca 

[clang] [compiler-rt] [libcxx] [libunwind] [llvm] [openmp] [cmake] switch to CMake's native `check_{compiler,linker}_flag` (PR #96171)

2024-06-20 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Here's a radical question, do we really want to use CMake's support for this? I 
remember a discussion recently about the increasingly large amount of time 
spent in the CMake configuration step, and most of that time is spent during 
these flag checks which pretty much all compile + link some file with no 
parallelism. I've also had issues working with these flags when trying to 
cross-compile things for the GPU, namely because the compilation flags insist 
on checking the linker so I need to do something like `set(CMAKE_REQUIRED_FLAGS 
"-c -flto")` to prevent it from invoking non-LLVM binaries for NVIDIA 
compilation.

https://github.com/llvm/llvm-project/pull/96171
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From 0cae8db24812b2ab5539cc581fbc461af072b5fd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/Basic/Targets/NVPTX.h   |   3 +-
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 916 insertions(+), 31 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index f476d49047c01..e30eaf808ca93 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;
   }
 
   bool isValidCPUName(StringRef Name) const override {
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca 

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From a05b24a06429c1ad6c4988f232442d53010e79a9 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  43 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 8 files changed, 915 insertions(+), 29 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32
+// CHECK-NEXT:[[TMP1:%.*]] = load i16, ptr [[S]], align 

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> With the possible exception of some alignment handling this looks about as 
> I'd expect it to. Ideally we'd get some feedback from nvptx-associated people 
> but fixing libc is a good sign

Yep, I believe @Artem-B is on vacation, so hopefully @AlexMaclean can chime in. 
This should be ABI compatible with NVIDIA as far as I'm aware.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits


@@ -938,6 +938,37 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext ) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module ) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module , LLVMContext , IRBuilder<> ,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout , Type *Parameter) override {
+// NVPTX doesn't apply minimum alignment to types present in structs. Types
+// with alignment less than four should be promoted by the compiler and 
will
+// get the proper minimum alignment in those cases.
+const unsigned MinAlign = 1;

jhuber6 wrote:

So, the standard varargs handling will automatically promote things like shorts 
to ints and floats to doubles. What the comment means is that `clang` already 
handled the size / alignment in those cases, so we need to use a minimum 
alignment of 1 so we respect the alignment for things that clang didn't modify.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-19 Thread Joseph Huber via cfe-commits


@@ -17,6 +17,8 @@
 #define MODULE_PASS(NAME, CREATE_PASS)
 #endif
 MODULE_PASS("generic-to-nvvm", GenericToNVVMPass())
+MODULE_PASS("expand-variadics",

jhuber6 wrote:

Couldn't remember if adding it to `addIRPasses` applied to all uses. I remember 
something like different pipeline configurations using different things. I'll 
try to figure it out.

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96015

>From bf6f8852621f4a5ac58e6d062d7c78e5eb639c1a Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  11 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXPassRegistry.def   |   2 +
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  44 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 918 insertions(+), 29 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..01a0b07856103 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,12 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
+
+  unsigned ArgumentsCount = 0;
   for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+I.info = ArgumentsCount++ < FI.getNumRequiredArgs()
+ ? classifyArgumentType(I.type)
+ : ABIArgInfo::getDirect();
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..f2f0768ae31ee
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32
+// 

[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits


@@ -203,8 +203,15 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
-  for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+
+  unsigned ArgumentsCount = 0;
+  for (auto  : FI.arguments()) {
+if (FI.isVariadic() && ArgumentsCount > 0)

jhuber6 wrote:

You're right, this needs to account for all fixed arguments, not just the first 
(guaranteed) one. NVIDIA seems to handle it where the fixed arguments are 
passed using the regular ABI (can be indirect or direct) while the variadic 
arguments are always direct. Is there an easy way to check if an argument is 
part of the variadic set? Maybe if the argument number > number of arguments?

https://github.com/llvm/llvm-project/pull/96015
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [NVPTX] Implement variadic functions using IR lowering (PR #96015)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96015

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.


>From 01d101dff102e4465ec284818f234152cd09c8da Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Jun 2024 15:32:31 -0500
Subject: [PATCH] [NVPTX] Implement variadic functions using IR lowering

Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.

We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.

The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.

1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.

This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
---
 clang/lib/CodeGen/Targets/NVPTX.cpp   |  16 +-
 clang/test/CodeGen/variadic-nvptx.c   |  77 
 libc/config/gpu/entrypoints.txt   |  15 +-
 libc/test/src/__support/CMakeLists.txt|  21 +-
 llvm/lib/Target/NVPTX/NVPTXPassRegistry.def   |   2 +
 llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp  |   2 +
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |  44 +-
 llvm/test/CodeGen/NVPTX/variadics-backend.ll  | 427 ++
 llvm/test/CodeGen/NVPTX/variadics-lowering.ll | 348 ++
 9 files changed, 922 insertions(+), 30 deletions(-)
 create mode 100644 clang/test/CodeGen/variadic-nvptx.c
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-backend.ll
 create mode 100644 llvm/test/CodeGen/NVPTX/variadics-lowering.ll

diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp 
b/clang/lib/CodeGen/Targets/NVPTX.cpp
index 423485c9ca16e..1a5205eb4dabc 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -203,8 +203,15 @@ ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) 
const {
 void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
   if (!getCXXABI().classifyReturnType(FI))
 FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
-  for (auto  : FI.arguments())
-I.info = classifyArgumentType(I.type);
+
+  unsigned ArgumentsCount = 0;
+  for (auto  : FI.arguments()) {
+if (FI.isVariadic() && ArgumentsCount > 0)
+  I.info = ABIArgInfo::getDirect();
+else
+  I.info = classifyArgumentType(I.type);
+++ArgumentsCount;
+  }
 
   // Always honor user-specified calling convention.
   if (FI.getCallingConvention() != llvm::CallingConv::C)
@@ -215,7 +222,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo ) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),
+  /*AllowHigherAlign=*/true, Slot);
 }
 
 void NVPTXTargetCodeGenInfo::setTargetAttributes(
diff --git a/clang/test/CodeGen/variadic-nvptx.c 
b/clang/test/CodeGen/variadic-nvptx.c
new file mode 100644
index 0..b47a5d7a2670d
--- /dev/null
+++ b/clang/test/CodeGen/variadic-nvptx.c
@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 approved this pull request.

LG overall, the growing number of "Is gpu target and some vendor" in the Driver 
is concerning.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-18 Thread Joseph Huber via cfe-commits


@@ -907,7 +907,8 @@ void CodeGenModule::Release() {
   if (Context.getTargetInfo().getTriple().isWasm())
 EmitMainVoidAlias();
 
-  if (getTriple().isAMDGPU()) {
+  if (getTriple().isAMDGPU() ||
+  (getTriple().isSPIRV() && getTriple().getVendor() == llvm::Triple::AMD)) 
{

jhuber6 wrote:

I'm wondering if we should add `isAMD` to `llvm::Triple` or something.

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

If you really need this, perhaps you can check if the Triple will invoke the 
fallback toolchain or something? Would be a lack of vendor in the Triple.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > I thought that clang accepted `-rpath `? I see that format when I try 
> > CPU offloading.
> 
> Yeah, but when running `--target=x86_64` and underlying gcc command is issued 
> and complains about `-rpath `

Oh, I see. When using `-fopenmp-targets=x86_64` it goes through the default GCC 
toolchain because you gave it no information. I'm wondering if we should bother 
supporting that since it's supposed to be 
`-fopenmp-targets=x86-64-unknown-linux-gnu` or similar. The GCC fallback isn't 
really guaranteed to work.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

I remember intentionally using the clang argument format instead of 
`-Wl,-rpath,` because the `-Wl` format would try to forward it to things 
like `nvlink` which don't support it.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits




jhuber6 wrote:

The tests use an option that causes nothing to actually run, so it only uses 
the filename.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits




jhuber6 wrote:

What is this?

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

I thought that clang accepted `-rpath `? I see that format when I try CPU 
offloading.

https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Forward -rpath flag to the correct format in CPU offloading (PR #95763)

2024-06-18 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95763
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-10 Thread Joseph Huber via cfe-commits


@@ -128,12 +128,13 @@ enum class CudaArch {
   GFX12_GENERIC,
   GFX1200,
   GFX1201,
+  AMDGCNSPIRV,
   Generic, // A processor model named 'generic' if the target backend defines a
// public one.
   LAST,
 
   CudaDefault = CudaArch::SM_52,
-  HIPDefault = CudaArch::GFX906,
+  HIPDefault = CudaArch::AMDGCNSPIRV,

jhuber6 wrote:

Yeah, makes sense. But doesn't the SPIR-V toolchain require extra tools?

https://github.com/llvm/llvm-project/pull/95061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >