[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
jhuber6 wrote: Re-did it and tested it against `libc` in https://github.com/llvm/llvm-project/pull/96972 so it will have a CI running it one that lands. it works for other cases I've tested, but let me know if something else should be added. https://github.com/llvm/llvm-project/pull/96561 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/96561 >From 849c8dab14c9332081a8c6331c9ca0c234793393 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/docs/ClangNVLinkWrapper.rst | 64 ++ clang/docs/index.rst | 1 + clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 65 ++ clang/test/lit.cfg.py | 1 + clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 + .../ClangNVLinkWrapper.cpp| 753 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 79 ++ 11 files changed, 1023 insertions(+), 57 deletions(-) create mode 100644 clang/docs/ClangNVLinkWrapper.rst create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/docs/ClangNVLinkWrapper.rst b/clang/docs/ClangNVLinkWrapper.rst new file mode 100644 index 00..0a312bdbf3066f --- /dev/null +++ b/clang/docs/ClangNVLinkWrapper.rst @@ -0,0 +1,64 @@ + +Clang nvlink Wrapper + + +.. contents:: + :local: + +.. _clang-nvlink-wrapper: + +Introduction + + +This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose +of this wrapper is to provide an interface similar to the ``ld.lld`` linker +while still relying on NVIDIA's proprietary linker to produce the final output. +Features include, static archive (.a) linking, LTO, and accepting files ending +in ``.o`` without error. + +Usage += + +This tool can be used with the following options. Any arguments not intended +only for the linker wrapper will be forwarded to ``nvlink``. + +.. code-block:: console + + OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker. + This enables static linking and LTO handling for NVPTX targets. + + USAGE: clang-nvlink-wrapper [options] + + OPTIONS: +--archSpecify the 'sm_' name of the target architecture. +--cuda-path=Set the system CUDA path +--dry-runPrint generated commands without running. +--feature Specify the '+ptx' freature to use for LTO. +-g Specify that this was a debug compile. +-help-hidden Display all available options +-helpDisplay available options (--help-hidden for more) +-L Add to the library search path +-l Search for library +-mllvm Arguments passed to LLVM, including Clang invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' for a list of options. +-o Path to file to write output +--plugin-opt=jobs= + Number of LTO codegen partitions +
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/96561 >From 859f6a7fce9503275ad7eb39512dc5833a11bb07 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 65 ++ clang/test/lit.cfg.py | 1 + clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 ++ .../ClangNVLinkWrapper.cpp| 671 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 68 ++ 9 files changed, 865 insertions(+), 57 deletions(-) create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
jhuber6 wrote: @MaskRay So, I think my symbol resolution is (unsurprisingly) subtly broken. Is there a canonical way to handle this? I first thought that we could simply perform the symbol resolutions as normal for every file, but keep track of which symbols were "lazy". However, I couldn't figure out how to then tell if a lazy symbol should be extracted or not because there's no information on which files use which symbols. Maybe I just scan all the files and see if they reference a symbol that's marked defined and lazy? https://github.com/llvm/llvm-project/pull/96561 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/96561 >From 6c70e542bbb355160b833ede6f86be0366953b88 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 65 ++ clang/test/lit.cfg.py | 1 + clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 ++ .../ClangNVLinkWrapper.cpp| 671 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 68 ++ 9 files changed, 865 insertions(+), 57 deletions(-) create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
jhuber6 wrote: > @Artem-B asked me to review nvptx patches while he's OOO, but this one is > pretty far outside my depth. Are you OK waiting until he's back? I don't know > exactly when that will be, but based on his IMs to me, he should be back > early July. No problem, I knew that it would probably take awhile to get reviewed given the size. I believe he said he'd be back early July as well, so maybe next week? It'd probably require his input, along with some of the other interested parties in clang to see how they feel about reviving one of these old tools. (However if you know anything about the NVPTX varargs API I think https://github.com/llvm/llvm-project/pull/96015 is mostly just waiting for someone to say that it's a mostly correct lowering) https://github.com/llvm/llvm-project/pull/96561 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/96561 >From 5edeeb9816fa5909f27a781f6e7213dd02ccdfa0 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 65 ++ clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 ++ .../ClangNVLinkWrapper.cpp| 671 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 68 ++ 8 files changed, 864 insertions(+), 57 deletions(-) create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode linking when given a '.o' -// file and device linking
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
jlebar wrote: @Artem-B asked me to review nvptx patches while he's OOO, but this one is pretty far outside my depth. Are you OK waiting until he's back? I don't know exactly when that will be, but based on his IMs to me, he should be back early July. https://github.com/llvm/llvm-project/pull/96561 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/96561 >From 8a52becd358abb2c96ca150db501d58c40b5250b Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 65 ++ clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 ++ .../ClangNVLinkWrapper.cpp| 671 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 68 ++ 8 files changed, 864 insertions(+), 57 deletions(-) create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode linking when given a '.o' -// file and device linking
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/96561 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) Changes Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` install/lib/nvptx64-nvidia-cuda/libc.a install/lib/nvptx64-nvidia-cuda/libc++.a install/lib/nvptx64-nvidia-cuda/libomp.a install/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff 8 Files Affected: - (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53) - (modified) clang/lib/Driver/ToolChains/Cuda.h (+3) - (modified) clang/test/Driver/cuda-cross-compiling.c (+4-4) - (added) clang/test/Driver/nvlink-wrapper.c (+64) - (modified) clang/tools/CMakeLists.txt (+1) - (added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44) - (added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671) - (added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68) ``diff diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode linking when given a '.o' -// file and device linking when given a '.cubin' file. We always want to -// perform device linking, so just rename any '.o' files. -// FIXME: This should hopefully be removed if NVIDIA updates their tooling. -if (II.isFilename()) { - auto InputFile = getToolChain().getInputFilename(II); - if (llvm::sys::path::extension(InputFile) !=
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) Changes Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` install/lib/nvptx64-nvidia-cuda/libc.a install/lib/nvptx64-nvidia-cuda/libc++.a install/lib/nvptx64-nvidia-cuda/libomp.a install/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff 8 Files Affected: - (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53) - (modified) clang/lib/Driver/ToolChains/Cuda.h (+3) - (modified) clang/test/Driver/cuda-cross-compiling.c (+4-4) - (added) clang/test/Driver/nvlink-wrapper.c (+64) - (modified) clang/tools/CMakeLists.txt (+1) - (added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44) - (added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671) - (added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68) ``diff diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we are invoking `nvlink` internally we need to output a `.cubin` file. - // FIXME: This should hopefully be removed if NVIDIA updates their tooling. - if (!C.getInputArgs().getLastArg(options::OPT_c)) { -SmallString<256> Filename(Output.getFilename()); -llvm::sys::path::replace_extension(Filename, "cubin"); -OutputFileName = Filename.str(); - } if (Output.isFilename() && OutputFileName != Output.getFilename()) C.addTempFile(Args.MakeArgString(OutputFileName)); @@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , // Add standard library search paths passed on the command line. Args.AddAllArgs(CmdArgs, options::OPT_L); getToolChain().AddFilePathLibArgs(Args, CmdArgs); + AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA); + + if (C.getDriver().isUsingLTO()) +addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0], + C.getDriver().getLTOMode() == LTOK_Thin); // Add paths for the default clang library path. SmallString<256> DefaultLibPath = @@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation , const JobAction , llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME); CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath)); - for (const auto : Inputs) { -if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR || -II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) { - C.getDriver().Diag(diag::err_drv_no_linker_llvm_support) - << getToolChain().getTripleString(); - continue; -} - -// The 'nvlink' application performs RDC-mode linking when given a '.o' -// file and device linking when given a '.cubin' file. We always want to -// perform device linking, so just rename any '.o' files. -// FIXME: This should hopefully be removed if NVIDIA updates their tooling. -if (II.isFilename()) { - auto InputFile = getToolChain().getInputFilename(II); - if
[clang] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' (PR #96561)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/96561 Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. >From d48deace957dfd2f1abaf232c1462a7725f7f1ee Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 24 Jun 2024 15:14:52 -0500 Subject: [PATCH] [Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink' Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` /lib/nvptx64-nvidia-cuda/libc.a /lib/nvptx64-nvidia-cuda/libc++.a /lib/nvptx64-nvidia-cuda/libomp.a /lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h| 3 + clang/test/Driver/cuda-cross-compiling.c | 8 +- clang/test/Driver/nvlink-wrapper.c| 64 ++ clang/tools/CMakeLists.txt| 1 + .../tools/clang-nvlink-wrapper/CMakeLists.txt | 44 ++ .../ClangNVLinkWrapper.cpp| 671 ++ .../tools/clang-nvlink-wrapper/NVLinkOpts.td | 68 ++ 8 files changed, 863 insertions(+), 57 deletions(-) create mode 100644 clang/test/Driver/nvlink-wrapper.c create mode 100644 clang/tools/clang-nvlink-wrapper/CMakeLists.txt create mode 100644 clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp create mode 100644 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 2dfc7457b0ac7..54724cc1ad08e 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , CmdArgs.push_back("--output-file"); std::string OutputFileName = TC.getInputFilename(Output); - // If we