[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-27 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Re-did it and tested it against `libc` in 
https://github.com/llvm/llvm-project/pull/96972 so it will have a CI running it 
one  that lands. it works for other cases I've tested, but let me know if 
something else should be added.

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-27 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 849c8dab14c9332081a8c6331c9ca0c234793393 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/docs/ClangNVLinkWrapper.rst |  64 ++
 clang/docs/index.rst  |   1 +
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 +
 .../ClangNVLinkWrapper.cpp| 753 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  79 ++
 11 files changed, 1023 insertions(+), 57 deletions(-)
 create mode 100644 clang/docs/ClangNVLinkWrapper.rst
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/docs/ClangNVLinkWrapper.rst 
b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 00..0a312bdbf3066f
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+
+Clang nvlink Wrapper
+
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final 
output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] 
+
+  OPTIONS:
+--archSpecify the 'sm_' name of the target architecture.
+--cuda-path=Set the system CUDA path
+--dry-runPrint generated commands without running.
+--feature Specify the '+ptx' freature to use for LTO.
+-g   Specify that this was a debug compile.
+-help-hidden Display all available options
+-helpDisplay available options (--help-hidden for more)
+-L  Add  to the library search path
+-l  Search for library 
+-mllvm  Arguments passed to LLVM, including Clang 
invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' 
for a list of options.
+-o Path to file to write output
+--plugin-opt=jobs=
+ Number of LTO codegen partitions
+

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 859f6a7fce9503275ad7eb39512dc5833a11bb07 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 9 files changed, 865 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

@MaskRay So, I think my symbol resolution is (unsurprisingly) subtly broken. Is 
there a canonical way to handle this? I first thought that we could simply 
perform the symbol resolutions as normal for every file, but keep track of 
which symbols were "lazy". However, I couldn't figure out how to then tell if a 
lazy symbol should be extracted or not because there's no information on which 
files use which symbols. Maybe I just scan all the files and see if they 
reference a symbol that's marked defined and lazy?

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-25 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 6c70e542bbb355160b833ede6f86be0366953b88 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/test/lit.cfg.py |   1 +
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 9 files changed, 865 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> @Artem-B asked me to review nvptx patches while he's OOO, but this one is 
> pretty far outside my depth. Are you OK waiting until he's back? I don't know 
> exactly when that will be, but based on his IMs to me, he should be back 
> early July.

No problem, I knew that it would probably take awhile to get reviewed given the 
size. I believe he said he'd be back early July as well, so maybe next week? 
It'd probably require his input, along with some of the other interested 
parties in clang to see how they feel about reviving one of these old tools.

(However if you know anything about the NVPTX varargs API I think 
https://github.com/llvm/llvm-project/pull/96015 is mostly just waiting for 
someone to say that it's a mostly correct lowering)

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 5edeeb9816fa5909f27a781f6e7213dd02ccdfa0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 864 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Justin Lebar via cfe-commits

jlebar wrote:

@Artem-B asked me to review nvptx patches while he's OOO, but this one is 
pretty far outside my depth.  Are you OK waiting until he's back?  I don't know 
exactly when that will be, but based on his IMs to me, he should be back early 
July.

https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/96561

>From 8a52becd358abb2c96ca150db501d58c40b5250b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  65 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 864 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/96561
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)


Changes

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
install/lib/nvptx64-nvidia-cuda/libc.a
install/lib/nvptx64-nvidia-cuda/libc++.a
install/lib/nvptx64-nvidia-cuda/libomp.a
install/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.


---

Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/96561.diff


8 Files Affected:

- (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53) 
- (modified) clang/lib/Driver/ToolChains/Cuda.h (+3) 
- (modified) clang/test/Driver/cuda-cross-compiling.c (+4-4) 
- (added) clang/test/Driver/nvlink-wrapper.c (+64) 
- (modified) clang/tools/CMakeLists.txt (+1) 
- (added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44) 
- (added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671) 
- (added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68) 


``diff
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking when given a '.cubin' file. We always want to
-// perform device linking, so just rename any '.o' files.
-// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-if (II.isFilename()) {
-  auto InputFile = getToolChain().getInputFilename(II);
-  if (llvm::sys::path::extension(InputFile) != 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)


Changes

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
install/lib/nvptx64-nvidia-cuda/libc.a
install/lib/nvptx64-nvidia-cuda/libc++.a
install/lib/nvptx64-nvidia-cuda/libomp.a
install/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.


---

Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/96561.diff


8 Files Affected:

- (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53) 
- (modified) clang/lib/Driver/ToolChains/Cuda.h (+3) 
- (modified) clang/test/Driver/cuda-cross-compiling.c (+4-4) 
- (added) clang/test/Driver/nvlink-wrapper.c (+64) 
- (modified) clang/tools/CMakeLists.txt (+1) 
- (added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44) 
- (added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671) 
- (added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68) 


``diff
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-SmallString<256> Filename(Output.getFilename());
-llvm::sys::path::replace_extension(Filename, "cubin");
-OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
 C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const 
JobAction ,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto  : Inputs) {
-if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR 
||
-II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) 
{
-  C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-  << getToolChain().getTripleString();
-  continue;
-}
-
-// The 'nvlink' application performs RDC-mode linking when given a '.o'
-// file and device linking when given a '.cubin' file. We always want to
-// perform device linking, so just rename any '.o' files.
-// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-if (II.isFilename()) {
-  auto InputFile = getToolChain().getInputFilename(II);
-  if 

[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)

2024-06-24 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96561

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.


>From d48deace957dfd2f1abaf232c1462a7725f7f1ee Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 24 Jun 2024 15:14:52 -0500
Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around
 'nvlink'

Summary:
The `clang-nvlink-wrapper` is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard `.cubin`, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU `libc` implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
```
/lib/nvptx64-nvidia-cuda/libc.a
/lib/nvptx64-nvidia-cuda/libc++.a
/lib/nvptx64-nvidia-cuda/libomp.a
/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a
```
Linking in these libraries will then simply require passing `-lc` like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient `nvlink` linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
`ld.lld`, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
---
 clang/lib/Driver/ToolChains/Cuda.cpp  |  61 +-
 clang/lib/Driver/ToolChains/Cuda.h|   3 +
 clang/test/Driver/cuda-cross-compiling.c  |   8 +-
 clang/test/Driver/nvlink-wrapper.c|  64 ++
 clang/tools/CMakeLists.txt|   1 +
 .../tools/clang-nvlink-wrapper/CMakeLists.txt |  44 ++
 .../ClangNVLinkWrapper.cpp| 671 ++
 .../tools/clang-nvlink-wrapper/NVLinkOpts.td  |  68 ++
 8 files changed, 863 insertions(+), 57 deletions(-)
 create mode 100644 clang/test/Driver/nvlink-wrapper.c
 create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt
 create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
 create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const 
JobAction ,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we