https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/204186
Summary: One persistent problem with the linker wrapper flow is that it was more difficult to reuse as a script than the previous flow. This is because it did a lot of work internally. In the past we moved al ot of this into dedicated LLVM tools, so now it's possible to simply use these tools instead. This PR changes the verbose mode handling to defer steps to tools rather than doing it internally. This allows users to use verbose printing and can copy/paste the results to re-run the steps. >From d3b56be783c250ded93aaa51104424826fd0f203 Mon Sep 17 00:00:00 2001 From: Joseph Huber <[email protected]> Date: Tue, 16 Jun 2026 09:45:45 -0500 Subject: [PATCH] [ClangLinkerWrapper] Use discrete steps in verbose mode Summary: One persistent problem with the linker wrapper flow is that it was more difficult to reuse as a script than the previous flow. This is because it did a lot of work internally. In the past we moved al ot of this into dedicated LLVM tools, so now it's possible to simply use these tools instead. This PR changes the verbose mode handling to defer steps to tools rather than doing it internally. This allows users to use verbose printing and can copy/paste the results to re-run the steps. --- .../linker-wrapper-verbose.c | 94 ++++++++++ .../ClangLinkerWrapper.cpp | 170 +++++++++++++++++- 2 files changed, 261 insertions(+), 3 deletions(-) create mode 100644 clang/test/OffloadTools/clang-linker-wrapper/linker-wrapper-verbose.c diff --git a/clang/test/OffloadTools/clang-linker-wrapper/linker-wrapper-verbose.c b/clang/test/OffloadTools/clang-linker-wrapper/linker-wrapper-verbose.c new file mode 100644 index 0000000000000..6df4c527e3fde --- /dev/null +++ b/clang/test/OffloadTools/clang-linker-wrapper/linker-wrapper-verbose.c @@ -0,0 +1,94 @@ +// REQUIRES: x86-registered-target + +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o + +// +// For OpenMP everything goes through the LLVM offloading binary type. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OPENMP + +// OPENMP: llvm-offload-binary{{.*}} {{.*}}.o --image=kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 --image=kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a +// OPENMP: clang{{.*}} --target=nvptx64-nvidia-cuda -march=sm_70 +// OPENMP: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx90a +// OPENMP: llvm-offload-binary{{.*}} -o {{.*}}.offload --image=file={{.*}}.img,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// OPENMP: llvm-offload-binary{{.*}} -o {{.*}}.offload --image=file={{.*}}.img,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a +// OPENMP: llvm-offload-wrapper{{.*}} --kind=openmp --triple=x86_64-unknown-linux-gnu -o [[BC:.*]].bc {{.*}}.offload {{.*}}.offload +// OPENMP: clang{{.*}} --no-default-config --target=x86_64-unknown-linux-gnu -c -fPIC -o {{.*}}.openmp.image.wrapper{{.*}}.o [[BC]].bc + +// +// The '--relocatable' flag is forwarded to the wrapper tool for OpenMP. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --linker-path=/usr/bin/ld -r %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE + +// RELOCATABLE: llvm-offload-wrapper{{.*}} --kind=openmp --triple=x86_64-unknown-linux-gnu -o {{.*}}.bc --relocatable {{.*}}.offload + +// +// For CUDA the device images are combined with 'fatbinary'. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t.elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_52 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA + +// CUDA: llvm-offload-binary{{.*}} {{.*}}.o --image=kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_70 --image=kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_52 +// CUDA: clang{{.*}} --target=nvptx64-nvidia-cuda -march=sm_70 +// CUDA: clang{{.*}} --target=nvptx64-nvidia-cuda -march=sm_52 +// CUDA: fatbinary{{.*}}--create [[FB:.*]].fatbin {{.*}}--image3=kind=elf,sm=70{{.*}}--image3=kind=elf,sm=52 +// CUDA: llvm-offload-wrapper{{.*}} --kind=cuda --triple=x86_64-unknown-linux-gnu -o [[BC:.*]].bc [[FB]].fatbin +// CUDA: clang{{.*}} --no-default-config --target=x86_64-unknown-linux-gnu -c -fPIC -o {{.*}}.cuda.image.wrapper{{.*}}.o [[BC]].bc + +// +// For HIP the device images are combined with 'clang-offload-bundler'. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx90a \ +// RUN: --image=file=%t.elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx908 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=HIP + +// HIP: llvm-offload-binary{{.*}} {{.*}}.o --image=kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx90a --image=kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx908 +// HIP: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx90a +// HIP: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx908 +// HIP: clang-offload-bundler{{.*}}-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa--gfx90a,hip-amdgcn-amd-amdhsa--gfx908{{.*}}-output=[[FB:.*]].hipfb +// HIP: llvm-offload-wrapper{{.*}} --kind=hip --triple=x86_64-unknown-linux-gnu -o [[BC:.*]].bc [[FB]].hipfb +// HIP: clang{{.*}} --no-default-config --target=x86_64-unknown-linux-gnu -c -fPIC -o {{.*}}.hip.image.wrapper{{.*}}.o [[BC]].bc + +// +// For SYCL the device image is linked with 'clang --sycl-link' and wrapped +// directly with 'llvm-offload-wrapper --kind=sycl'. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=sycl,triple=spirv64-unknown-unknown,arch=generic +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SYCL + +// SYCL: llvm-offload-binary{{.*}} {{.*}}.o --image=kind=sycl,triple=spirv64-unknown-unknown,arch=generic +// SYCL: clang{{.*}} --target=spirv64-unknown-unknown {{.*}} --sycl-link {{.*}}-triple=spirv64-unknown-unknown{{.*}}-arch= +// SYCL: llvm-offload-wrapper{{.*}} --kind=sycl --triple=x86_64-unknown-linux-gnu -o [[BC:.*]].bc {{.*}}.img +// SYCL: clang{{.*}} --no-default-config --target=x86_64-unknown-linux-gnu -c -fPIC -o {{.*}}.sycl.image.wrapper{{.*}}.o [[BC]].bc + +// +// Images pulled from a static archive are referenced by the archive path in the +// extraction replay, not by the embedded member name. +// +// RUN: llvm-offload-binary -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: rm -f %t.a && llvm-ar rcs %t.a %t.o +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --dry-run \ +// RUN: --should-extract=sm_70 --linker-path=/usr/bin/ld %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=ARCHIVE + +// ARCHIVE: llvm-offload-binary{{.*}} {{.*}}.a --image=kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp index cfdd11e1d298d..dc869d0e72259 100644 --- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp +++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp @@ -721,6 +721,67 @@ Expected<StringRef> compileModule(Module &M, OffloadKind Kind) { return *TempFileOrErr; } +/// Performs the wrapping stage with individual tool invocations for verbose +/// printing. +Expected<StringRef> +wrapDeviceImagesVerbose(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers, + const ArgList &Args, OffloadKind Kind) { + Expected<std::string> WrapperPath = findProgram( + "llvm-offload-wrapper", {getExecutableDir("llvm-offload-wrapper")}); + if (!WrapperPath) + return WrapperPath.takeError(); + + llvm::Triple Triple( + Args.getLastArgValue(OPT_host_triple_EQ, sys::getDefaultTargetTriple())); + + // Generate the runtime registration bitcode from the bundled images. + auto BitcodeOrErr = createOutputFile( + ExecutableName + "." + getOffloadKindName(Kind) + ".image.wrapper", "bc"); + if (!BitcodeOrErr) + return BitcodeOrErr.takeError(); + + SmallVector<StringRef> WrapperArgs = { + *WrapperPath, + Args.MakeArgString("--kind=" + getOffloadKindName(Kind)), + Args.MakeArgString("--triple=" + Triple.getTriple()), + "-o", + *BitcodeOrErr, + }; + if (Kind == OFK_OpenMP && Args.hasArg(OPT_relocatable)) + WrapperArgs.push_back("--relocatable"); + for (const auto &Buffer : Buffers) + WrapperArgs.push_back(Buffer->getBufferIdentifier()); + + if (Error Err = executeCommands(*WrapperPath, WrapperArgs)) + return std::move(Err); + + // Compile the generated registration bitcode into a host object. + Expected<std::string> ClangPath = + findProgram("clang", {getExecutableDir("clang")}); + if (!ClangPath) + return ClangPath.takeError(); + + auto ObjectOrErr = createOutputFile( + ExecutableName + "." + getOffloadKindName(Kind) + ".image.wrapper", "o"); + if (!ObjectOrErr) + return ObjectOrErr.takeError(); + + SmallVector<StringRef> ClangArgs = { + *ClangPath, + "--no-default-config", + Args.MakeArgString("--target=" + Triple.getTriple()), + "-c", + "-fPIC", + "-o", + *ObjectOrErr, + *BitcodeOrErr, + }; + if (Error Err = executeCommands(*ClangPath, ClangArgs)) + return std::move(Err); + + return *ObjectOrErr; +} + /// Creates the object file containing the device image and runtime /// registration code from the device images stored in \p Images. Expected<StringRef> @@ -728,6 +789,10 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers, const ArgList &Args, OffloadKind Kind) { llvm::TimeTraceScope TimeScope("Wrap bundled images"); + // We use the discrete tools if we are in verbose mode. + if (Verbose && !Args.hasArg(OPT_print_wrapped_module)) + return wrapDeviceImagesVerbose(Buffers, Args, Kind); + SmallVector<ArrayRef<char>, 4> BuffersToWrap; for (const auto &Buffer : Buffers) BuffersToWrap.emplace_back( @@ -790,8 +855,48 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers, return *FileOrErr; } +/// Perform the OpenMP bundling with 'llvm-offload-binary' in verbose mode. Expected<SmallVector<std::unique_ptr<MemoryBuffer>>> -bundleOpenMP(ArrayRef<OffloadingImage> Images) { +bundleOpenMPVerbose(ArrayRef<OffloadingImage> Images, const ArgList &Args) { + Expected<std::string> OffloadBinaryPath = findProgram( + "llvm-offload-binary", {getExecutableDir("llvm-offload-binary")}); + if (!OffloadBinaryPath) + return OffloadBinaryPath.takeError(); + + BumpPtrAllocator Alloc; + StringSaver Saver(Alloc); + SmallVector<std::unique_ptr<MemoryBuffer>> Buffers; + for (const OffloadingImage &Image : Images) { + StringRef ImageFile = Image.Image->getBufferIdentifier(); + auto BinaryOrErr = + createOutputFile(sys::path::stem(ImageFile) + "." + + getOffloadKindName(Image.TheOffloadKind), + "offload"); + if (!BinaryOrErr) + return BinaryOrErr.takeError(); + + std::string ImageArg = ("--image=file=" + ImageFile + + ",kind=" + getOffloadKindName(Image.TheOffloadKind)) + .str(); + for (const auto &[Key, Value] : Image.StringData) + ImageArg += ("," + Key + "=" + Value).str(); + + SmallVector<StringRef> CmdArgs = {*OffloadBinaryPath, "-o", *BinaryOrErr, + Saver.save(ImageArg)}; + if (Error Err = executeCommands(*OffloadBinaryPath, CmdArgs)) + return std::move(Err); + + auto BufferOrErr = MemoryBuffer::getFileOrSTDIN(*BinaryOrErr); + if (std::error_code EC = BufferOrErr.getError()) + return createFileError(*BinaryOrErr, EC); + Buffers.emplace_back(std::move(*BufferOrErr)); + } + + return std::move(Buffers); +} + +Expected<SmallVector<std::unique_ptr<MemoryBuffer>>> +bundleOpenMP(ArrayRef<OffloadingImage> Images, const ArgList &Args) { SmallVector<std::unique_ptr<MemoryBuffer>> Buffers; for (const OffloadingImage &Image : Images) Buffers.emplace_back( @@ -807,7 +912,8 @@ bundleSYCL(ArrayRef<OffloadingImage> Images) { // clang-sycl-linker packs outputs into one binary blob. Therefore, it is // passed to Offload Wrapper as is. StringRef S(Image.Image->getBufferStart(), Image.Image->getBufferSize()); - Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(S)); + Buffers.emplace_back( + MemoryBuffer::getMemBufferCopy(S, Image.Image->getBufferIdentifier())); } return std::move(Buffers); @@ -866,7 +972,8 @@ bundleLinkedOutput(ArrayRef<OffloadingImage> Images, const ArgList &Args, llvm::TimeTraceScope TimeScope("Bundle linked output"); switch (Kind) { case OFK_OpenMP: - return bundleOpenMP(Images); + return Verbose ? bundleOpenMPVerbose(Images, Args) + : bundleOpenMP(Images, Args); case OFK_SYCL: return bundleSYCL(Images); case OFK_Cuda: @@ -1162,6 +1269,54 @@ std::optional<std::string> searchLibrary(StringRef Input, StringRef Root, return searchLibraryBaseName(Input, Root, SearchPaths); } +/// In verbose mode we need to replay the extracted files so the user can +/// reproduce the generated. This only prints the steps that would result in the +/// same output files given the input. +Error emitExtractCommands( + ArrayRef<SmallVector<OffloadFile>> InputsForTarget, + const DenseMap<StringRef, StringRef> &SourceForImage) { + Expected<std::string> OffloadBinaryPath = findProgram( + "llvm-offload-binary", {getExecutableDir("llvm-offload-binary")}); + if (!OffloadBinaryPath) + return OffloadBinaryPath.takeError(); + + BumpPtrAllocator Alloc; + StringSaver Saver(Alloc); + MapVector<StringRef, SmallVector<StringRef>> Commands; + DenseSet<StringRef> Seen; + for (const auto &Input : InputsForTarget) { + for (const OffloadFile &File : Input) { + const OffloadBinary &Binary = *File.getBinary(); + StringRef Identifier = Binary.getMemoryBufferRef().getBufferIdentifier(); + StringRef Source = SourceForImage.lookup(Identifier); + if (Source.empty()) + Source = Identifier; + + StringRef TripleStr = Binary.getTriple(); + StringRef Arch = Binary.getArch(); + StringRef Kind = getOffloadKindName(Binary.getOffloadKind()); + + std::string ImageArg = + ("--image=kind=" + Kind + ",triple=" + TripleStr).str(); + if (!Arch.empty()) + ImageArg += (",arch=" + Arch).str(); + + // Some images are shared and only need to be extracted once. + StringRef SavedImage = Saver.save(ImageArg); + if (!Seen.insert(Saver.save(Source + "\x01" + SavedImage)).second) + continue; + Commands[Source].push_back(SavedImage); + } + } + + for (const auto &[Source, Images] : Commands) { + SmallVector<StringRef> CmdArgs = {*OffloadBinaryPath, Source}; + llvm::append_range(CmdArgs, Images); + printCommands(CmdArgs); + } + return Error::success(); +} + /// Search the input files and libraries for embedded device offloading code /// and add it to the list of files to be linked. Files coming from static /// libraries are only added to the input if they are used by an existing @@ -1186,6 +1341,7 @@ getDeviceInput(const ArgList &Args) { bool WholeArchive = Args.hasArg(OPT_wholearchive_flag); SmallVector<OffloadFile> ObjectFilesToExtract; SmallVector<OffloadFile> ArchiveFilesToExtract; + DenseMap<StringRef, StringRef> SourceForImage; for (const opt::Arg *Arg : Args.filtered( OPT_INPUT, OPT_library, OPT_whole_archive, OPT_no_whole_archive)) { if (Arg->getOption().matches(OPT_whole_archive) || @@ -1220,6 +1376,10 @@ getDeviceInput(const ArgList &Args) { return std::move(Err); for (auto &Binary : Binaries) { + if (Verbose) + SourceForImage.try_emplace( + Binary.getBinary()->getMemoryBufferRef().getBufferIdentifier(), + Saver.save(StringRef(*Filename))); if (identify_magic(Buffer.getBuffer()) == file_magic::archive && !WholeArchive) ArchiveFilesToExtract.emplace_back(std::move(Binary)); @@ -1282,6 +1442,10 @@ getDeviceInput(const ArgList &Args) { for (auto &[ID, Input] : InputFiles) InputsForTarget.emplace_back(std::move(Input)); + if (Verbose) + if (Error Err = emitExtractCommands(InputsForTarget, SourceForImage)) + return std::move(Err); + return std::move(InputsForTarget); } _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
