from:"Joseph Huber via cfe\-commits"

[lldb] [pstl] [llvm] [mlir] [libc] [compiler-rt] [libcxx] [openmp] [clang-tools-extra] [clang] [lld] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-24 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,5 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with 
code 0.
+// DEFINE: %{check} = %clang -### -c -mcmodel=medium

jhuber6 wrote:

Probably depends on the option we're testing. We could do both.

https://github.com/llvm/llvm-project/pull/79222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Do not link device code under a relocatable link (PR #79314)

2024-01-24 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/79314

Summary:
A relocatable link through `clang -r` can go through the
clang-linker-wrapper if offloading is enabled. This will have the effect
of linking the device code and creating the wrapper module. It will then
be merged into the final file. This is useful behavior on its own, but
is likely not what is expected for a `-r` job.

This patch makes the linker wrapper ignore the device code when doing a
reloctable link. This has the effect of the linker merging the
`.llvm.offloading` sections in the output object. These will then be
parsed as normal when the executable is finally created.

Even though this doesn't actually perform a reloctable link on the
device code itself, it has a similar effect of combining multiple files
into a single one.


>From 0f8d9bb329b6d50493286e117ea0fe45e0a49247 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 24 Jan 2024 09:41:15 -0600
Subject: [PATCH] [LinkerWrapper] Do not link device code under a relocatable
 link

Summary:
A relocatable link through `clang -r` can go through the
clang-linker-wrapper if offloading is enabled. This will have the effect
of linking the device code and creating the wrapper module. It will then
be merged into the final file. This is useful behavior on its own, but
is likely not what is expected for a `-r` job.

This patch makes the linker wrapper ignore the device code when doing a
reloctable link. This has the effect of the linker merging the
`.llvm.offloading` sections in the output object. These will then be
parsed as normal when the executable is finally created.

Even though this doesn't actually perform a reloctable link on the
device code itself, it has a similar effect of combining multiple files
into a single one.
---
 clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp | 6 ++
 clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td   | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 5485a4b74bf8a8f..b682cc293d54b21 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1356,6 +1356,12 @@ Expected>>
 getDeviceInput(const ArgList ) {
   llvm::TimeTraceScope TimeScope("ExtractDeviceCode");
 
+  // If the user is requesting a reloctable link we ignore the device code. The
+  // actual linker will merge the embedded device code sections so they can be
+  // linked when the executable is finally created.
+  if (Args.hasArg(OPT_relocatable))
+return SmallVector>{};
+
   StringRef Root = Args.getLastArgValue(OPT_sysroot_EQ);
   SmallVector LibraryPaths;
   for (const opt::Arg *Arg : Args.filtered(OPT_library_path, OPT_libpath))
diff --git a/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td 
b/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
index b6d3297987fffe5..c59cb0fb3e7cbfc 100644
--- a/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
+++ b/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
@@ -127,6 +127,9 @@ def version : Flag<["--", "-"], "version">, 
Flags<[HelpHidden]>, Alias;
 def whole_archive : Flag<["--", "-"], "whole-archive">, Flags<[HelpHidden]>;
 def no_whole_archive : Flag<["--", "-"], "no-whole-archive">, 
Flags<[HelpHidden]>;
 
+def relocatable : Flag<["--", "-"], "relocatable">, Flags<[HelpHidden]>;
+def r : Flag<["-"], "r">, Alias, Flags<[HelpHidden]>;
+
 // link.exe-style linker options.
 def out : Joined<["/", "-", "/?", "-?"], "out:">, Flags<[HelpHidden]>;
 def libpath : Joined<["/", "-", "/?", "-?"], "libpath:">, Flags<[HelpHidden]>;

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [clang] [openmp] [lld] [clang-tools-extra] [lldb] [libcxx] [compiler-rt] [mlir] [llvm] [pstl] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/79222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [lldb] [lld] [compiler-rt] [clang] [mlir] [libc] [libcxx] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,7 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with 
code 0.
+// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 
--cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check
+// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium %s
+// RUN: %{check} -fbasic-block-sections=all
+
+// REDEFINE: %{gpu_opts} = -x hip --rocm-path=%S/Inputs/rocm -nogpulib

jhuber6 wrote:

Should probably include `-nogpuinc` as well. Best way to avoid spurious 
failures due to lack of a local CUDA / ROCm installation. Maybe in the future 
LLVM based offloading won't depend on so much external stuff.

https://github.com/llvm/llvm-project/pull/79222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [lldb] [llvm] [mlir] [compiler-rt] [lld] [libcxx] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)

2024-01-23 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,7 @@
+/// Some target-specific options are ignored for GPU, so %clang exits with 
code 0.
+// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 
--cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check
+// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium %s
+// RUN: %{check} -fbasic-block-sections=all

jhuber6 wrote:

Offloading compilation for these single-source languages pretty much just 
combines one "host" compilation job with N "Device" compilation jobs. Doing 
`--offload-device-only` and `--offload-host-only` simply does one part of that. 
There's probably some flags that behave differently depending on which end 
you're compiling on, so maybe it would be useful for separating that behavior 
if needed.

https://github.com/llvm/llvm-project/pull/79222
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offload] Fix the offloading wrapper when merged multiple times. (PR #79231)

2024-01-23 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/79231

Summary:
The offloading wrapper is a object file that contains code necessary to
register offloading entries for the given runtime. Currently, we
expected only one of these to be present when we make the final
executable. However, in the case of redistributable linking with `-r` we
can end up with multiple of these being generated before finally
creating the executable.

This patch simply changes the defintiions of these globals to be
mergable. This allows multiples of these to participate in a single link
job. For ELF, we just make the dummy variable internal and used so it
sets up the section as expected. For COFF we make the entries weak_odr
so they merge to a single symbol


>From 1c60daabebbca2189100217d271ff6fada2746e8 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 23 Jan 2024 17:53:40 -0600
Subject: [PATCH] [Offload] Fix the offloading wrapper when merged multiple
 times.

Summary:
The offloading wrapper is a object file that contains code necessary to
register offloading entries for the given runtime. Currently, we
expected only one of these to be present when we make the final
executable. However, in the case of redistributable linking with `-r` we
can end up with multiple of these being generated before finally
creating the executable.

This patch simply changes the defintiions of these globals to be
mergable. This allows multiples of these to participate in a single link
job. For ELF, we just make the dummy variable internal and used so it
sets up the section as expected. For COFF we make the entries weak_odr
so they merge to a single symbol
---
 clang/test/Driver/linker-wrapper-image.c | 18 +-
 llvm/lib/Frontend/Offloading/Utility.cpp | 19 +++
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index fa2e59e36b3828a..b5d8ae217a9723d 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -14,10 +14,10 @@
 
 //  OPENMP-ELF: @__start_omp_offloading_entries = external hidden constant 
[0 x %struct.__tgt_offload_entry]
 // OPENMP-ELF-NEXT: @__stop_omp_offloading_entries = external hidden constant 
[0 x %struct.__tgt_offload_entry]
-// OPENMP-ELF-NEXT: @__dummy.omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"
+// OPENMP-ELF-NEXT: @__dummy.omp_offloading_entries = internal constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"
 
-//  OPENMP-COFF: @__start_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OA"
-// OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OZ"
+//  OPENMP-COFF: @__start_omp_offloading_entries = weak_odr hidden 
constant [0 x %struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OA"
+// OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = weak_odr hidden constant 
[0 x %struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OZ"
 
 //  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 
8
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
@@ -47,10 +47,10 @@
 
 //  CUDA-ELF: @__start_cuda_offloading_entries = external hidden constant 
[0 x %struct.__tgt_offload_entry]
 // CUDA-ELF-NEXT: @__stop_cuda_offloading_entries = external hidden constant 
[0 x %struct.__tgt_offload_entry]
-// CUDA-ELF-NEXT: @__dummy.cuda_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"
+// CUDA-ELF-NEXT: @__dummy.cuda_offloading_entries = internal constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"
 
-//  CUDA-COFF: @__start_cuda_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"cuda_offloading_entries$OA"
-// CUDA-COFF-NEXT: @__stop_cuda_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"cuda_offloading_entries$OZ"
+//  CUDA-COFF: @__start_cuda_offloading_entries = weak_odr hidden constant 
[0 x %struct.__tgt_offload_entry] zeroinitializer, section 
"cuda_offloading_entries$OA"
+// CUDA-COFF-NEXT:

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-23 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[mlir] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Joseph Huber via cfe-commits



@@ -99,6 +99,7 @@ class ROCDLDialectLLVMIRTranslationInterface
   if (!llvmFunc->hasFnAttribute("amdgpu-flat-work-group-size")) {
 llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1,256");
   }
+  llvmFunc->addFnAttr("amdgpu-implicitarg-num-bytes", "256");

jhuber6 wrote:

Wasn't this in the other review as well?

https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[mlir] [llvm] [clang] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Update llvm-objdump lit tests for COV5 (PR #79039)

2024-01-22 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/79039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [mlir] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Seems straightforward enough

https://github.com/llvm/llvm-project/pull/79038
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-22 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (PR #78333)

2024-01-22 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

You should add a test that checks the output of `-ccc-print-phases` and 
`-ccc-print-bindings`.

https://github.com/llvm/llvm-project/pull/78333
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-22 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> FYI. There is a failure in liner-wrapper.c in 
> https://buildkite.com/llvm-project/github-pull-requests/builds/30337#018d1aaa-8225-4630-a5f0-527d1c7c129d
> 
> ```
> # note: command had no output on stdout or stderr
>   | # error: command failed with exit status: 1
>   | # executed command: 'c:\ws\src\build\bin\filecheck.exe' 
> 'C:\ws\src\clang\test\Driver\linker-wrapper.c' --check-prefix=AMD-TARGET-ID
>   | # .---command stderr
>   | # \| C:\ws\src\clang\test\Driver\linker-wrapper.c:172:19: error: 
> AMD-TARGET-ID: expected string not found in input
>   | # \| // AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img 
> --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -Wl,--no-undefined 
> {{.*}}.o {{.*}}.o
>   | # \|   ^
>   | # \| :1:1: note: scanning from here
>   | # \| c:\ws\src\build\bin\clang-linker-wrapper.exe: error: invalid argument
>   | # \| ^
>   | # \|
>   | # \| Input file: 
>   | # \| Check file: C:\ws\src\clang\test\Driver\linker-wrapper.c
>   | # \|
>   | # \| -dump-input=help explains the following input dump.
>   | # \|
>   | # \| Input was:
>   | # \| <<
>   | # \|1: c:\ws\src\build\bin\clang-linker-wrapper.exe: error: 
> invalid argument
>   | # \| check:172 
> X~ error: 
> no match found
>   | # \|2: invalid argument
>   | # \| check:172 ~
>   | # \| >>
>   | # `-
>   | # error: command failed with exit status: 1
> ```

Is that not fixed? I pushed something to address that yesterday afternoon. 

https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] ec0ac85 - [Clang][Obvious] Correctly disable Windows on linker-wrapper test

2024-01-20 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-01-20T12:53:03-06:00
New Revision: ec0ac85e58f0a80cc52a132336b132ffe7b50b59

URL: 
https://github.com/llvm/llvm-project/commit/ec0ac85e58f0a80cc52a132336b132ffe7b50b59
DIFF: 
https://github.com/llvm/llvm-project/commit/ec0ac85e58f0a80cc52a132336b132ffe7b50b59.diff

LOG: [Clang][Obvious] Correctly disable Windows on linker-wrapper test

Added: 


Modified: 
clang/test/Driver/linker-wrapper.c

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index a6de616b05e9fb..632df63c797e4c 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,7 +2,7 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
-// UNSUPPORTED: system-linux
+// REQUIRES: system-linux
 
 // An externally visible variable so static libraries extract.
 __attribute__((visibility("protected"), used)) int x;



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cb2f340 - [CUDA] Disable registering surfaces and textures with the new driver

2024-01-18 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-01-18T10:56:33-06:00
New Revision: cb2f340850db007aebf5012858697ba5afc1ce4e

URL: 
https://github.com/llvm/llvm-project/commit/cb2f340850db007aebf5012858697ba5afc1ce4e
DIFF: 
https://github.com/llvm/llvm-project/commit/cb2f340850db007aebf5012858697ba5afc1ce4e.diff

LOG: [CUDA] Disable registering surfaces and textures with the new driver

Summary:
These runtime calls don't seem to be supported anymore, disable them for
now.

Added: 


Modified: 
clang/test/Driver/linker-wrapper-image.c
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 147d315f8e399e..fa2e59e36b3828 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -121,11 +121,9 @@
 // CUDA-NEXT:   br label %if.end
 
 //  CUDA: sw.surface:
-// CUDA-NEXT:   call void @__cudaRegisterSurface(ptr %0, ptr %addr, ptr %name, 
ptr %name, i32 %textype, i32 %extern)
 // CUDA-NEXT:   br label %if.end
 
 //  CUDA: sw.texture:
-// CUDA-NEXT:   call void @__cudaRegisterTexture(ptr %0, ptr %addr, ptr %name, 
ptr %name, i32 %textype, i32 %normalized, i32 %extern)
 // CUDA-NEXT:   br label %if.end
 
 //  CUDA: if.end:

diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index bfb54f58330bda..5485a4b74bf8a8 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -916,7 +916,8 @@ wrapDeviceImages(ArrayRef> 
Buffers,
   case OFK_Cuda:
 if (Error Err = offloading::wrapCudaBinary(
 M, BuffersToWrap.front(),
-offloading::getOffloadEntryArray(M, "cuda_offloading_entries")))
+offloading::getOffloadEntryArray(M, "cuda_offloading_entries"),
+/*Suffix=*/"", /*EmitSurfacesAndTextures=*/false))
   return std::move(Err);
 break;
   case OFK_HIP:



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 2b804f8 - [LinkerWrapper][Obvious] Fix move on temporary object

2024-01-18 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-01-18T10:42:13-06:00
New Revision: 2b804f875579995b1588f1a079e265929163d0e4

URL: 
https://github.com/llvm/llvm-project/commit/2b804f875579995b1588f1a079e265929163d0e4
DIFF: 
https://github.com/llvm/llvm-project/commit/2b804f875579995b1588f1a079e265929163d0e4.diff

LOG: [LinkerWrapper][Obvious] Fix move on temporary object

Summary:
This causes warnings because it is already a temporary and does not need
to be moved.

Added: 


Modified: 
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Removed: 




diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 03c83e2f92b3220..bfb54f58330bdad 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1439,7 +1439,7 @@ getDeviceInput(const ArgList ) {
   if (Index == CompatibleTargets.size() - 1)
 InputFiles[ID].emplace_back(std::move(Binary));
   else
-InputFiles[ID].emplace_back(std::move(Binary.copy()));
+InputFiles[ID].emplace_back(Binary.copy());
 }
 
 // If we extracted any files we need to check all the symbols again.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2024-01-18 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Replaced by https://github.com/llvm/llvm-project/pull/78359

https://github.com/llvm/llvm-project/pull/72442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2024-01-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/72442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-18 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/78359

>From 2a460f6ff9e7bca938adca5487609df41616e8c1 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 15 Jan 2024 15:42:06 -0600
Subject: [PATCH] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when
 linking

Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.

disable Windows

Fix compatibility check
---
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c   |  2 +-
 clang/test/Driver/linker-wrapper.c| 21 +
 .../ClangLinkerWrapper.cpp| 82 +++
 llvm/include/llvm/Object/OffloadBinary.h  | 25 ++
 llvm/lib/Object/OffloadBinary.cpp | 32 
 6 files changed, 126 insertions(+), 38 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 997ec2d491d02c..6e4fbe6816810f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8700,7 +8700,7 @@ void OffloadPackager::ConstructJob(Compilation , const 
JobAction ,
 SmallVector Parts{
 "file=" + File.str(),
 "triple=" + TC->getTripleString(),
-"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
+"arch=" + Arch.str(),
 "kind=" + Kind.str(),
 };
 
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index f38486ad07..daa41b216089b2 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -65,7 +65,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
-// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
+// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e51c5ea381d31a..a6de616b05e9fb 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,11 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
+// UNSUPPORTED: system-linux
+
+// An externally visible variable so static libraries extract.
+__attribute__((visibility("protected"), used)) int x;
+
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
@@ -150,3 +155,19 @@
 // RUN:   --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 
| FileCheck %s --check-prefix=COFF
 
 // COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe 
{{.*}}openmp.image.wrapper{{.*}}
+
+// RUN: clang-offload-packager -o %t-lib.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o 
-fembed-offload-object=%t-lib.out
+// RUN: llvm-ar rcs %t.a %t.o
+// RUN: clang-offload-packager -o %t-on.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack+
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-on.o 
-fembed-offload-object=%t-on.out
+// RUN: clang-offload-packager -o %t-off.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack-
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-off.o 
-fembed-offload-object=%t-off.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld -- %t-on.o %t-off.o %t.a -o a.out 2>&1 | 
FileCheck %s

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/78359

>From d7c8a6e0cb2289af939a90e82afbc6e35b08010c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 15 Jan 2024 15:42:06 -0600
Subject: [PATCH 1/3] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when
 linking

Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
---
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c   |  2 +-
 clang/test/Driver/linker-wrapper.c| 19 +
 .../ClangLinkerWrapper.cpp| 82 +++
 llvm/include/llvm/Object/OffloadBinary.h  | 25 ++
 llvm/lib/Object/OffloadBinary.cpp | 28 +++
 6 files changed, 120 insertions(+), 38 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 9edae3fec91a87..25e022ca2f6328 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8688,7 +8688,7 @@ void OffloadPackager::ConstructJob(Compilation , const 
JobAction ,
 SmallVector Parts{
 "file=" + File.str(),
 "triple=" + TC->getTripleString(),
-"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
+"arch=" + Arch.str(),
 "kind=" + Kind.str(),
 };
 
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index f38486ad07..daa41b216089b2 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -65,7 +65,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
-// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
+// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e51c5ea381d31a..2057f6a594bdf7 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,9 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
+// An externally visible variable so static libraries extract.
+__attribute__((visibility("protected"), used)) int x;
+
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
@@ -150,3 +153,19 @@
 // RUN:   --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 
| FileCheck %s --check-prefix=COFF
 
 // COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe 
{{.*}}openmp.image.wrapper{{.*}}
+
+// RUN: clang-offload-packager -o %t-lib.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o 
-fembed-offload-object=%t-lib.out
+// RUN: llvm-ar rcs %t.a %t.o
+// RUN: clang-offload-packager -o %t-on.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack+
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-on.o 
-fembed-offload-object=%t-on.out
+// RUN: clang-offload-packager -o %t-off.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack-
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-off.o 
-fembed-offload-object=%t-off.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld -- %t-on.o %t-off.o %t.a -o a.out 2>&1 | 
FileCheck %s --check-prefix=AMD-TARGET-ID
+
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/78359

>From d7c8a6e0cb2289af939a90e82afbc6e35b08010c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 15 Jan 2024 15:42:06 -0600
Subject: [PATCH 1/2] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when
 linking

Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
---
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c   |  2 +-
 clang/test/Driver/linker-wrapper.c| 19 +
 .../ClangLinkerWrapper.cpp| 82 +++
 llvm/include/llvm/Object/OffloadBinary.h  | 25 ++
 llvm/lib/Object/OffloadBinary.cpp | 28 +++
 6 files changed, 120 insertions(+), 38 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 9edae3fec91a87..25e022ca2f6328 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8688,7 +8688,7 @@ void OffloadPackager::ConstructJob(Compilation , const 
JobAction ,
 SmallVector Parts{
 "file=" + File.str(),
 "triple=" + TC->getTripleString(),
-"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
+"arch=" + Arch.str(),
 "kind=" + Kind.str(),
 };
 
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index f38486ad07..daa41b216089b2 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -65,7 +65,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
-// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
+// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e51c5ea381d31a..2057f6a594bdf7 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,9 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
+// An externally visible variable so static libraries extract.
+__attribute__((visibility("protected"), used)) int x;
+
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
@@ -150,3 +153,19 @@
 // RUN:   --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 
| FileCheck %s --check-prefix=COFF
 
 // COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe 
{{.*}}openmp.image.wrapper{{.*}}
+
+// RUN: clang-offload-packager -o %t-lib.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o 
-fembed-offload-object=%t-lib.out
+// RUN: llvm-ar rcs %t.a %t.o
+// RUN: clang-offload-packager -o %t-on.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack+
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-on.o 
-fembed-offload-object=%t-on.out
+// RUN: clang-offload-packager -o %t-off.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack-
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-off.o 
-fembed-offload-object=%t-off.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld -- %t-on.o %t-off.o %t.a -o a.out 2>&1 | 
FileCheck %s --check-prefix=AMD-TARGET-ID
+
+// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-17 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Looks like it still has that Windows failure. That's going to be impossible to 
debug on account of the fact that I have no clue how to run this thing on 
Windows. The precommit checking takes a whole day to run as well. The only 
error message is "invalid argument", so I really have no clue what could be 
causing it.

https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-17 Thread Joseph Huber via cfe-commits



@@ -162,6 +162,19 @@ class OffloadFile : public OwningBinary {
   std::unique_ptr Buffer)
   : OwningBinary(std::move(Binary), std::move(Buffer)) {}
 
+  /// Make a deep copy of this offloading file.
+  OffloadFile copy() const {
+std::unique_ptr Buffer = MemoryBuffer::getMemBufferCopy(
+getBinary()->getMemoryBufferRef().getBuffer());
+
+// This parsing should never fail because it has already been parsed.
+auto NewBinaryOrErr = OffloadBinary::create(*Buffer);
+assert(NewBinaryOrErr && "Failed to parse a copy of the binary?");
+if (!NewBinaryOrErr)

jhuber6 wrote:

Errors always need to be checked, even if they were successful. If the user did 
not compile with assertions on this would abort the program otherwise.

https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-16 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

This is a redo of what was originally in 
https://github.com/llvm/llvm-project/pull/72442

https://github.com/llvm/llvm-project/pull/78359
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (PR #78359)

2024-01-16 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/78359

Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.


>From d7c8a6e0cb2289af939a90e82afbc6e35b08010c Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 15 Jan 2024 15:42:06 -0600
Subject: [PATCH] [LinkerWrapper] Handle AMDGPU Target-IDs correctly when
 linking

Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
---
 clang/lib/Driver/ToolChains/Clang.cpp |  2 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c   |  2 +-
 clang/test/Driver/linker-wrapper.c| 19 +
 .../ClangLinkerWrapper.cpp| 82 +++
 llvm/include/llvm/Object/OffloadBinary.h  | 25 ++
 llvm/lib/Object/OffloadBinary.cpp | 28 +++
 6 files changed, 120 insertions(+), 38 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 9edae3fec91a87f..25e022ca2f63283 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8688,7 +8688,7 @@ void OffloadPackager::ConstructJob(Compilation , const 
JobAction ,
 SmallVector Parts{
 "file=" + File.str(),
 "triple=" + TC->getTripleString(),
-"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
+"arch=" + Arch.str(),
 "kind=" + Kind.str(),
 };
 
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index f38486ad073..daa41b216089b2b 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -65,7 +65,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
-// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
+// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e51c5ea381d31ae..2057f6a594bdf78 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,9 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
+// An externally visible variable so static libraries extract.
+__attribute__((visibility("protected"), used)) int x;
+
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
@@ -150,3 +153,19 @@
 // RUN:   --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 
| FileCheck %s --check-prefix=COFF
 
 // COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe 
{{.*}}openmp.image.wrapper{{.*}}
+
+// RUN: clang-offload-packager -o %t-lib.out \
+// RUN:

[clang] [Clang] Add a NULL check (PR #77131)

2024-01-16 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Thanks for the patch, this one likely fell through the cracks because it has no 
assigned reviewers. We'll need a test based off of the original bug report. Put 
that in `clang/test/OpenMP/.c` and then look at other tests for what 
it should look like. LLVM uses `lit` to test, you can run it yourself with 
`bin/llvm-lit -vv .c` from the binary provided in the LLVM 
build.

If this function emits diagnostics and doesn't compile, then you should do a 
Sema check like the tests with "messages" in their name. If it's a codegen 
test, you can use the `update_cc_test_checks` script to autogenerate the 
LLVM-IR checks for the test and trim the ones you think are relevant. 

https://github.com/llvm/llvm-project/pull/77131
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add a NULL check (PR #77131)

2024-01-16 Thread Joseph Huber via cfe-commits



@@ -21067,6 +21067,10 @@ Sema::ActOnOpenMPDependClause(const 
OMPDependClause::DependDataTy ,
   ExprTy = ATy->getElementType();
 else
   ExprTy = BaseType->getPointeeType();
+// bug 69200
+if (ExprTy.isNull()) {
+  continue;
+}

jhuber6 wrote:

```suggestion
if (ExprTy.isNull())
  continue;
```
LLVM style omits braces on a single line block.

https://github.com/llvm/llvm-project/pull/77131
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [clang] [libc] Give more functions restrict qualifiers (NFC) (PR #78061)

2024-01-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/78061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Thanks. I'll probably make a patch after this to make the surface handling for 
CUDA default off because it seems to be unsupported.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [Libc] Give more functions restrict qualifiers (PR #78061)

2024-01-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Thanks.

https://github.com/llvm/llvm-project/pull/78061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [clang] [llvm] [Libc] Give more functions restrict qualifiers (PR #78061)

2024-01-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > LLVM changes look unrelated, it was originally copied from OpenBSD it 
> > seems. But it's not a major issue.
> 
> FWIW I opened a few PRs in FreeBSD regarding this.

Yeah, go ahead and move that portion there so the people who know more about 
LLVM's regex can look at it compared to the `libc` team.

https://github.com/llvm/llvm-project/pull/78061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [llvm] [clang] [Libc] Give more functions restrict qualifiers (PR #78061)

2024-01-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

LLVM changes look unrelated, it was originally copied from OpenBSD it seems. 
But it's not a major issue. 

https://github.com/llvm/llvm-project/pull/78061
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module , 
GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module , ArrayRef> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+Module , ArrayRef> Images,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+  M, Images,
+  EntryArray
+  ? *EntryArray
+  : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),

jhuber6 wrote:

Think it should be fine to just call this with 
`offloading::getOffloadEntryArray(M, "xxx_offloading_entries")` at the 
callsite. `std::optional` makes it a little weird here.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module , 
GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module , ArrayRef> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+Module , ArrayRef> Images,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+  M, Images,
+  EntryArray
+  ? *EntryArray
+  : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
+  Suffix);
   if (!Desc)
 return createStringError(inconvertibleErrorCode(),
  "No binary descriptors created.");
-  createRegisterFunction(M, Desc);
-  createUnregisterFunction(M, Desc);
+  createRegisterFunction(M, Desc, Suffix);

jhuber6 wrote:

What is the Suffix  for exactly? It might be better just to give it some 
generic name, since the executed use currently it always `_cuda_` or `_omp_` as 
a name within some other stuff.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,62 @@
+//===- OffloadWrapper.h --r-*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+#define LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/IR/Module.h"
+
+namespace llvm {
+namespace offloading {
+/// Class for embedding and registering offloading images and related objects 
in
+/// a Module.
+class OffloadWrapper {

jhuber6 wrote:

I feel like these should just be free functions and the extra two bits of state 
here are additional default arguments like you've done with `EntryArray`.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module , 
GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module , ArrayRef> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+Module , ArrayRef> Images,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+  M, Images,
+  EntryArray
+  ? *EntryArray
+  : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
+  Suffix);
   if (!Desc)
 return createStringError(inconvertibleErrorCode(),
  "No binary descriptors created.");
-  createRegisterFunction(M, Desc);
-  createUnregisterFunction(M, Desc);
+  createRegisterFunction(M, Desc, Suffix);
+  createUnregisterFunction(M, Desc, Suffix);
   return Error::success();
 }
 
-Error wrapCudaBinary(Module , ArrayRef Image) {
-  GlobalVariable *Desc = createFatbinDesc(M, Image, /* IsHIP */ false);
+Error OffloadWrapper::wrapCudaBinary(
+Module , ArrayRef Image,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createFatbinDesc(M, Image, /* IsHIP */ false, Suffix);
   if (!Desc)
 return createStringError(inconvertibleErrorCode(),
  "No fatinbary section created.");
 
-  createRegisterFatbinFunction(M, Desc, /* IsHIP */ false);
+  createRegisterFatbinFunction(M, Desc, /* IsHIP */ false, EntryArray, Suffix,
+   EmitSurfacesAndTextures);
   return Error::success();
 }
 
-Error wrapHIPBinary(Module , ArrayRef Image) {
-  GlobalVariable *Desc = createFatbinDesc(M, Image, /* IsHIP */ true);
+Error OffloadWrapper::wrapHIPBinary(
+Module , ArrayRef Image,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createFatbinDesc(M, Image, /* IsHIP */ true, Suffix);
   if (!Desc)
 return createStringError(inconvertibleErrorCode(),
  "No fatinbary section created.");
 
-  createRegisterFatbinFunction(M, Desc, /* IsHIP */ true);
+  createRegisterFatbinFunction(M, Desc, /* IsHIP */ true, EntryArray, Suffix,

jhuber6 wrote:

Can you fix these comments while you're at it? LLVM inline comments should be 
`/*IsHIP=*/`

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module , 
GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module , ArrayRef> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+Module , ArrayRef> Images,
+std::optional EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+  M, Images,
+  EntryArray
+  ? *EntryArray
+  : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),

jhuber6 wrote:

Thinking that this argument shouldn't be default, it should be up to whoever 
calls it to create such an array. For the linker wrapper it would be getting 
the offloading utility first. Making these arrays is quite complicated for 
implicit default behavior if we're expecting other things to happen I feel.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,62 @@
+//===- OffloadWrapper.h --r-*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+#define LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/IR/Module.h"
+
+namespace llvm {
+namespace offloading {
+/// Class for embedding and registering offloading images and related objects 
in
+/// a Module.
+class OffloadWrapper {
+public:
+  using EntryArrayTy = std::pair;
+
+  OffloadWrapper(const Twine  = "", bool EmitSurfacesAndTextures = true)
+  : Suffix(Suffix.str()), EmitSurfacesAndTextures(EmitSurfacesAndTextures) 
{
+  }
+
+  /// Wraps the input device images into the module \p M as global symbols and
+  /// registers the images with the OpenMP Offloading runtime libomptarget.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error wrapOpenMPBinaries(
+  llvm::Module , llvm::ArrayRef> Images,
+  std::optional EntryArray = std::nullopt) const;
+
+  /// Wraps the input fatbinary image into the module \p M as global symbols 
and
+  /// registers the images with the CUDA runtime.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error
+  wrapCudaBinary(llvm::Module , llvm::ArrayRef Images,
+ std::optional EntryArray = std::nullopt) const;
+
+  /// Wraps the input bundled image into the module \p M as global symbols and
+  /// registers the images with the HIP runtime.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error
+  wrapHIPBinary(llvm::Module , llvm::ArrayRef Images,
+std::optional EntryArray = std::nullopt) const;
+
+protected:
+  /// Suffix used when emitting symbols. It defaults to the empty string.
+  std::string Suffix;
+
+  /// Whether to emit surface and textures registration code. It defaults to
+  /// false.
+  bool EmitSurfacesAndTextures;

jhuber6 wrote:

So, I wasn't sure about this either. I know that CUDA emits these 
`__cudaRegisterSurface` calls, but I can't seem to find them in any of the 
exported libraries. It caused linker errors due to that and I was too lazy to 
fix it. Wondering if they've been deprecated, maybe @tra knows.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (PR #78057)

2024-01-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Thanks, some comments.

https://github.com/llvm/llvm-project/pull/78057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [clang-tools-extra] [llvm] [AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 (PR #72129)

2024-01-12 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> As a somewhat naive question, what would it take to turn off requiring 
> codegen to be in SCC order? We seem to be the only target doing that. The 
> comments on that line say something about function calls and noinline

I believe this is also the reason parallel codegen via `--lto-partitions` 
creates incorrect code, so if there were a way to avoid that it would be 
beneficial in other ways. I'm by no means an expert, but as far as I'm aware 
the SCC order is used for some resource scheduling.

https://github.com/llvm/llvm-project/pull/72129
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-12 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > > An AMDGPU library function is not internalized and can be used to 
> > > fullfill calls generated by LLVM passes or instruction selection.
> > 
> > 
> > I am confused by the description of "internalized". Do you refer to LTO 
> > internalization? You can leverage `llvm.used` to disable LTO 
> > internalization.
> 
> Yes I mean LTO internalization. We want keep them to the backend but we also 
> want to remove them if they are not used by the backend. `llvm.used` won't 
> tell us that we can remove them since it could be specified by the users for 
> non-amdgpu-library functions.

I wonder if we could just define another `llvm.used` similar to 
`llvm.compiler.used` for this special case where the variable can be thrown 
away by the backend.

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > An AMDGPU library function is not internalized and can be used to fullfill 
> > calls generated by LLVM passes or instruction selection.
> 
> I am confused by the description of "internalized". Do you refer to LTO 
> internalization? You can leverage `llvm.used` to disable LTO internalization.

My guess is that the function should be considered used and then thrown away by 
the backend.

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Joseph Huber via cfe-commits



@@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr {
   let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }
 
+def AMDGPULibFun : InheritableAttr {

jhuber6 wrote:

Why isn't this a `TargetSpecificAttr`? We should have one for AMDGPU.

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Joseph Huber via cfe-commits



@@ -2693,6 +2693,17 @@ An error will be given if:
   }];
 }
 
+def AMDGPULibFunDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+The ``amdgpu_lib_fun`` attribute can be applied to a function for AMDGPU target
+to indicate it is a library function which are handled specially in backend.
+An AMDGPU library function is not internalized and can be used to fullfill
+calls generated by LLVM passes or instruction selection. Unused AMDGPU library
+functions will be eliminated by the backend.

jhuber6 wrote:

The wording is a little confusing here, just what I'm guessing from the jist.
```suggestion
The ``amdgpu_lib_fun`` attribute can be applied to a function for while 
targeting
AMDGPU to indicate that it will be handled specially by the backend.
A library function will not be optimized out by standard LLVM passes and can be 
used to resolve function calls. These functions will not be emitted by the 
backend.
```

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I was thinking of implementing libm/libc for nvptx, which would produce an IR 
> library . We'll still need to keep the functions around if they are not used 
> explicitly, because we may need them to fulfill libcalls later in the 
> compilation pipeline. Sort of a libdevice replacement which can be used for 
> libcall materialization.
> 
> But you're right, with RDC object files used for offloading it's probably not 
> necessary.

That's one problem I'm unsure of how to solve currently. Right now when doing 
LTO, there's a list of "libfuncs" that backends can emit. If the function is 
one of these we can't interalize / optimize out the symbol. I was attempting to 
relax this in https://reviews.llvm.org/D154364 at some point, because ideally 
we don't want to do this if the backend doesn't use them, but we don't have 
that logic right now.

Right now, the issue is how to handle divergence for different targets. So, for 
`libc/libm` we just build the same library N times for each architecture. This 
allows us to use things like `__CUDA_ARCH__` and `__has_builtin` as normal 
because it has a unique file for each architecture. However, I really don't 
think that N files is a scalable solution and would like to be able to create 
generic IR for a single file. Basically I'd like to have something like 
`libdevice.bc` where it's just one file. The problem is that we don't have a 
good, robust way to express this. Nvidia uses their reflection you're well 
aware of, and AMD uses external globals which need to be resolved by some link 
job.

One reason I'd like this is because I'd really like to be able to provide my 
`crt1.o` and `libc.a` as exported targets such that someone can do `clang++ 
--target=amdgcn-amd-amdhsa -mcpu=native foo.cpp crt1.o -lc` and have it work 
correctly. Right now fishing out the correct file requires linker wrapper 
magic. 

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add function attrbute amdgpu-lib-fun (PR #74737)

2024-01-09 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

My use-case is more to be able to write functions like `is_wavefrontsize64()` 
in regular C++ code. This would require some way to emit builtins for these.

I believe the use-case here is a workaround for the issues caused by library 
ordering? I'm guessing this is related to the problems caused by prematurely 
optimizing out library functions that later passes wanted to depend on. 

https://github.com/llvm/llvm-project/pull/74737
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [flang] [libcxx] [lld] [compiler-rt] [lldb] [clang] [llvm] [libc] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-08 Thread Joseph Huber via cfe-commits



@@ -58,6 +60,22 @@ class GlobalTy {
   void setPtr(void *P) { Ptr = P; }
 };
 
+typedef void *IntPtrT;
+struct __llvm_profile_data {
+#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name;
+#include "llvm/ProfileData/InstrProfData.inc"
+};
+
+/// PGO profiling data extracted from a GPU device
+struct GPUProfGlobals {
+  std::string names;
+  std::vector> counts;
+  std::vector<__llvm_profile_data> data;
+  Triple targetTriple;
+

jhuber6 wrote:

That's confusing, how would using a `std::vector` not have that problem as 
well? I'll need to look into that.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 8f76f18 - [OpenMP][Obvious] Fix test failing on BE architectures

2024-01-07 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2024-01-07T08:38:50-06:00
New Revision: 8f76f1816ea63b7cc28e150ba319ffbfe6351f9e

URL: 
https://github.com/llvm/llvm-project/commit/8f76f1816ea63b7cc28e150ba319ffbfe6351f9e
DIFF: 
https://github.com/llvm/llvm-project/commit/8f76f1816ea63b7cc28e150ba319ffbfe6351f9e.diff

LOG: [OpenMP][Obvious] Fix test failing on BE architectures

Summary:
This accidentally included a byte past the magic, which was out of order
on big endian architectures.

Added: 


Modified: 
clang/test/Driver/linker-wrapper-image.c

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 40dde2e0291800..03caa1eb084e6e 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -19,7 +19,7 @@
 //  OPENMP-COFF: @__start_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OA"
 // OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OZ"
 
-// OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD\01{{.*}}", section ".llvm.offloading", 
align 8
+//  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 
8
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
 // OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Change `__tgt_device_image` to point to the image (PR #77003)

2024-01-07 Thread Joseph Huber via cfe-commits



@@ -19,8 +19,8 @@
 //  OPENMP-COFF: @__start_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OA"
 // OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OZ"
 
-//  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"
-// OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr 
@.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
+// OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD\01{{.*}}", section ".llvm.offloading", 
align 8

jhuber6 wrote:

Whoops, that was an accident to not cut it off after the magic bytes. I'll fix 
it, thanks for bringing this to my attention.

https://github.com/llvm/llvm-project/pull/77003
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [Flang][Driver] Enable gpulibc/nogpulibc options for Flang, which allows linking of GPU LIBC for the fortran and OpenMP runtime (PR #77135)

2024-01-05 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I am gonna sign off for the weekend as it's quite late here, so I'll reply in 
> a little more detail on Monday and update the PR further. but I'd be happy to 
> add a further flang test, although not too sure what it'd be, so suggestions 
> are welcome.
> 
> I tested this with an out of tree build of GPU libc (basically two seperate 
> build directories) and found that -lgpuc wouldn't get the ordering correct to 
> link the library correctly to the fortran runtime, so it seemed for this 
> specific case of an out of tree build of GPU libc the option was the correct 
> way to get it linked in in the correct order. In the case of it finding it in 
> the correct directory i didn't quite manage the perfect build recipe for it 
> (suggestions welcome here as well) and tend to not use the install option 
> myself, but perhaps it would auto detect for Flang as well! However, in the 
> case where it's an separately compiled and installed gpu libc it might be 
> nice to have this option activated as well for Flang to make both methods of 
> linking possible. However, i am a little bit of a driver and build 
> environment/system noob so ill defer to everyone else's better judgement in 
> this case!

If you have the static library, and it contains an entry for the desired 
architecture, it should just work so long as you're using the "new" driver 
pipeline. However, ordering is important here. It behaves similarly to the GNU 
BFD linker, where a static library is only checked against the current state of 
the symbol table as it reads the files in input order. So `uses.o -lfoo` will 
extract but `-lfoo uses.o` will not.

It's possible that this just was being linked too late with however Fortran 
handles it. I decided to be conservative with the default here because I'm 
assuming very few people will actually have the GPU `libc`. 

It would be very interesting to see something like `puts` working from Fortran, 
so let me know if there's anything I can do to help.

https://github.com/llvm/llvm-project/pull/77135
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Change `__tgt_device_image` to point to the image (PR #77003)

2024-01-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/77003
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [Flang][Driver] Enable gpulibc/nogpulibc options for Flang, which allows linking of GPU LIBC for the fortran and OpenMP runtime (PR #77135)

2024-01-05 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Makes sense to me, though this is not my area of expertise. Could you add a 
> bit more elaborate test? Perhaps something that would check the linker 
> invocation>?

I'm not familiar with how Fortran handles stuff here. It's tested in the 
`clang` portion at least. The handling of this is in `CommonArgs` somewhere I 
believe. If Fortran shares that it should be inherited, so it's at least tested 
in the `clang` version so it might be fine.

https://github.com/llvm/llvm-project/pull/77135
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [Flang][Driver] Enable gpulibc/nogpulibc options for Flang, which allows linking of GPU LIBC for the fortran and OpenMP runtime (PR #77135)

2024-01-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Accepting this with Fortran makes sense. This option basically controls whether 
or not the GPU toolchain will implicitly include the `libcgpu.a` static library 
via `-lcgpu`. It defaults to on if it finds the `libc` wrapper headers in the 
`clang` resource directory, 
`lib/clang/18/include/llvm_libc_wrappers/llvm-libc-decls`. I'm assuming that 
Fortran doesn't have this?

It's supposed to wrap around the C standard headers so the compiler knows that 
we have certain `libc` functions on the GPU. However, OpenMP will pretty much 
just assume anything referenced on the GPU is implicitly on the device so it 
will likely work for most functions without the wrapper headers. The important 
exception is `stdout` and  friends. Because this is a global, OpenMP by default 
will try to map the host value rather than use the one present in `libcgpu` so 
we need to declare it on the GPU so it avoids the implicit map.

I'd be very interested in troubleshooting anything to get this working on 
Fortran.

https://github.com/llvm/llvm-project/pull/77135
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [lld] [lldb] [clang-tools-extra] [llvm] [compiler-rt] [flang] [libcxx] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-05 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {
+  // Check if given current global is a profiling global based
+  // on name
+  if (name->equals(getInstrProfNamesVarName())) {
+// Read in profiled function names
+std::vector chars(sym.getSize() / sizeof(char), ' ');
+GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data());

jhuber6 wrote:

Okay, this should use `SmallVector` as well, don't bother dividing by the size 
because the one reported from the ELF is absolute.  Then just make the data 
inside `uint8_t`.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [lld] [lldb] [clang-tools-extra] [llvm] [compiler-rt] [flang] [libcxx] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [libc] [lldb] [openmp] [clang] [llvm] [flang] [compiler-rt] [libcxx] [lld] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-05 Thread Joseph Huber via cfe-commits



@@ -58,6 +60,22 @@ class GlobalTy {
   void setPtr(void *P) { Ptr = P; }
 };
 
+typedef void *IntPtrT;

jhuber6 wrote:

Okay. you should use the C++ `using` keyword instead of C's `typedef.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [libc] [lldb] [openmp] [clang] [llvm] [flang] [compiler-rt] [libcxx] [lld] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-05 Thread Joseph Huber via cfe-commits



@@ -58,6 +60,22 @@ class GlobalTy {
   void setPtr(void *P) { Ptr = P; }
 };
 
+typedef void *IntPtrT;
+struct __llvm_profile_data {
+#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name;
+#include "llvm/ProfileData/InstrProfData.inc"
+};
+
+/// PGO profiling data extracted from a GPU device
+struct GPUProfGlobals {
+  std::string names;
+  std::vector> counts;
+  std::vector<__llvm_profile_data> data;
+  Triple targetTriple;
+

jhuber6 wrote:

All of them, SmallVector is a std::vector with small size optimizations like 
`std::string` basically.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][OpenMP] Fix stdio.h wrapper when glibc includes again (PR #77017)

2024-01-04 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

TYVM for fixing this. There's a lot of hacky stuff we need to do here to make 
it work, but it is what it is.

Guessing the other wrapped files are fine? I remember having problems with 
`cytype` and `string` but I hopefully resolved a lot of those already.

https://github.com/llvm/llvm-project/pull/77017
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [lld] [clang-tools-extra] [libcxx] [llvm] [flang] [libc] [clang] [lldb] [compiler-rt] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {

jhuber6 wrote:

This is incorrect. If this returns an error it will exit the if, call the 
deconstructor, and then crash the program because it was not handled.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [lldb] [clang-tools-extra] [compiler-rt] [flang] [llvm] [clang] [libcxx] [openmp] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {
+  // Check if given current global is a profiling global based
+  // on name
+  if (name->equals(getInstrProfNamesVarName())) {
+// Read in profiled function names
+std::vector chars(sym.getSize() / sizeof(char), ' ');
+GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data());

jhuber6 wrote:

Okay, we're reading a string back from the device? What's the purpose of that? 
Also, just so you know, the ELF will only contain the correct size if it's 
emitted as an array. E.g.
```
const char a[] = "a"; // strlen("a") + 1 in ELF
const char *b = "b"; // sizeof(char *) in ELF
```

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [openmp] [compiler-rt] [libcxx] [clang-tools-extra] [lld] [llvm] [clang] [flang] [lldb] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {
+  // Check if given current global is a profiling global based
+  // on name
+  if (name->equals(getInstrProfNamesVarName())) {
+// Read in profiled function names
+std::vector chars(sym.getSize() / sizeof(char), ' ');
+GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data());
+if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal))
+  return Err;
+std::string names(chars.begin(), chars.end());
+profdata.names = std::move(names);
+  } else if (name->starts_with(getInstrProfCountersVarPrefix())) {

jhuber6 wrote:

Are the `getInstrProfCountersVarPrefix` function preexisting? I don't see them 
defined in this patch set.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [libcxx] [clang-tools-extra] [compiler-rt] [clang] [flang] [llvm] [libc] [openmp] [lldb] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;

jhuber6 wrote:

```suggestion
  GPUProfGlobals ProfData;
```
LLVM style. Also not a fan of the name. Maybe `DeviceProfileData` or something.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [clang-tools-extra] [openmp] [flang] [libc] [libcxx] [llvm] [lldb] [compiler-rt] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -58,6 +60,22 @@ class GlobalTy {
   void setPtr(void *P) { Ptr = P; }
 };
 
+typedef void *IntPtrT;
+struct __llvm_profile_data {
+#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name;
+#include "llvm/ProfileData/InstrProfData.inc"
+};
+
+/// PGO profiling data extracted from a GPU device
+struct GPUProfGlobals {
+  std::string names;
+  std::vector> counts;
+  std::vector<__llvm_profile_data> data;
+  Triple targetTriple;
+

jhuber6 wrote:

These should probably use LLVM structs. E.g. `StringRef` is the name is a 
constant string with stable storage and `SmallVector`.

I'd really appreciate some descriptions of how this is supposed to look and how 
it interacts with the existing profile data.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[compiler-rt] [clang] [clang-tools-extra] [flang] [llvm] [libcxx] [lld] [lldb] [libc] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -58,6 +60,22 @@ class GlobalTy {
   void setPtr(void *P) { Ptr = P; }
 };
 
+typedef void *IntPtrT;

jhuber6 wrote:

What's the utility of this?

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[compiler-rt] [lldb] [openmp] [llvm] [clang-tools-extra] [lld] [flang] [clang] [libcxx] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {
+  // Check if given current global is a profiling global based
+  // on name
+  if (name->equals(getInstrProfNamesVarName())) {
+// Read in profiled function names
+std::vector chars(sym.getSize() / sizeof(char), ' ');
+GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data());
+if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal))
+  return Err;
+std::string names(chars.begin(), chars.end());
+profdata.names = std::move(names);
+  } else if (name->starts_with(getInstrProfCountersVarPrefix())) {
+// Read global variable profiling counts
+std::vector counts(sym.getSize() / sizeof(int64_t), 0);
+GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data());
+if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal))
+  return Err;
+profdata.counts.push_back(std::move(counts));
+  } else if (name->starts_with(getInstrProfDataVarPrefix())) {
+// Read profiling data for this global variable
+__llvm_profile_data data{};
+GlobalTy DataGlobal(name->str(), sym.getSize(), );
+if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal))
+  return Err;
+profdata.data.push_back(std::move(data));
+  }
+}
+  }
+  return profdata;
+}

jhuber6 wrote:

LLVM style for everything here.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [clang] [lld] [clang-tools-extra] [compiler-rt] [flang] [lldb] [libcxx] [llvm] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();
+  // Iterate through elf symbols
+  for (auto  : ELFObj->symbols()) {
+if (auto name = sym.getName()) {
+  // Check if given current global is a profiling global based
+  // on name
+  if (name->equals(getInstrProfNamesVarName())) {
+// Read in profiled function names
+std::vector chars(sym.getSize() / sizeof(char), ' ');

jhuber6 wrote:

Why are we turning this into a vector of chars? Also isn't `sizeof(char)` 
pretty much always going to be `1`?

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [libc] [clang-tools-extra] [compiler-rt] [lldb] [llvm] [flang] [libcxx] [openmp] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-04 Thread Joseph Huber via cfe-commits



@@ -163,3 +163,87 @@ Error 
GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy ,
 
   return Plugin::success();
 }
+
+bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GlobalTy global(getInstrProfNamesVarName().str(), 0);
+  if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) {
+consumeError(std::move(Err));
+return false;
+  }
+  return true;
+}
+
+Expected
+GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy ,
+ DeviceImageTy ) {
+  GPUProfGlobals profdata;
+  auto ELFObj = getELFObjectFile(Image);
+  if (!ELFObj)
+return ELFObj.takeError();
+  profdata.targetTriple = ELFObj->makeTriple();

jhuber6 wrote:

Made a patch in https://github.com/llvm/llvm-project/pull/76992 and 
https://github.com/llvm/llvm-project/pull/76970 to make this actually work.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Change `__tgt_device_image` to point to the image (PR #77003)

2024-01-04 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/77003

Summary:
We use the OffloadBinary to contain bundled offloading objects used to
support many images / targets at the same time. The `__tgt_device_info`
struct used to contain a pointer to this underlying binary format, which
contains information about the triple and architecture. We used to parse
this in the runtime to do image verification.

Recent changes removed the need for this to be used internally, as we
just parse it out of the ELF directly. This patch sets the pointers up
so they point to the ELF without requiring any further parsing.


>From f6516f5e6d9eecdc7ce5710660753acf8a70da99 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 4 Jan 2024 15:00:15 -0600
Subject: [PATCH] [OpenMP] Change `__tgt_device_image` to point to the image

Summary:
We use the OffloadBinary to contain bundled offloading objects used to
support many images / targets at the same time. The `__tgt_device_info`
struct used to contain a pointer to this underlying binary format, which
contains information about the triple and architecture. We used to parse
this in the runtime to do image verification.

Recent changes removed the need for this to be used internally, as we
just parse it out of the ELF directly. This patch sets the pointers up
so they point to the ELF without requiring any further parsing.
---
 clang/test/Driver/linker-wrapper-image.c  |  4 ++--
 .../clang-linker-wrapper/OffloadWrapper.cpp   | 24 ---
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index a2a1996f664309..40dde2e0291800 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -19,8 +19,8 @@
 //  OPENMP-COFF: @__start_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OA"
 // OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section 
"omp_offloading_entries$OZ"
 
-//  OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"
-// OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr 
@.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
+// OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD\01{{.*}}", section ".llvm.offloading", 
align 8
+// OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr 
inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, 
i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
 // OPENMP-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_reg, ptr null }]
 // OPENMP-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] 
[{ i32, ptr, ptr } { i32 1, ptr @.omp_offloading.descriptor_unreg, ptr null }]
diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp 
b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
index f4f500b173572d..161374ae555233 100644
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
@@ -8,6 +8,7 @@
 
 #include "OffloadWrapper.h"
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/BinaryFormat/Magic.h"
 #include "llvm/Frontend/Offloading/Utility.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/GlobalVariable.h"
@@ -121,19 +122,36 @@ GlobalVariable *createBinDesc(Module , 
ArrayRef> Bufs) {
   SmallVector ImagesInits;
   ImagesInits.reserve(Bufs.size());
   for (ArrayRef Buf : Bufs) {
+// We embed the full offloading entry so the binary utilities can parse it.
 auto *Data = ConstantDataArray::get(C, Buf);
-auto *Image = new GlobalVariable(M, Data->getType(), /*isConstant*/ true,
+auto *Image = new GlobalVariable(M, Data->getType(), /*isConstant=*/true,
  GlobalVariable::InternalLinkage, Data,
  ".omp_offloading.device_image");
 Image->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
 Image->setSection(".llvm.offloading");

[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)

2024-01-03 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Test should probably show that IR is equivalent to `#pragma omp requires 
unified_shared_memory` or however that's spelled. Basic documentation should be 
provided by the help test in the new flag, but we probably have somewhere in 
the OpenMP docs you could add it to if desired.

https://github.com/llvm/llvm-project/pull/75467
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2024-01-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> 
> How about using `--offload=` which can take a target triple? E.g.
> 
> * `--offload=spirv64-amd` or something like that: pick HIPAMD tool chain.
> 
> * `--offload=spirv64`: pick HIPSPV tool chain.
> 
> 
> And also remove this 
> [limitation](https://github.com/llvm/llvm-project/blob/5fc712c4bbe84e6cbaa1f7d2a0300f613f11b0c3/clang/lib/Driver/Driver.cpp#L3130-L3136)
>  if you want `--offload` to work along with `--offload-arch`.
> 
> Or alternatively allow multiple `--offload` options, deprecate 
> `--offload-arch` and use `--offload` instead. For convenience and easy 
> transition, options like `--offload=` could be allowed where the 
> `` is treated as an alias for an offload target (E.g. 
> `--offload=gfx900` could imply `--offload=amdgcn-amd-amdhsa:gfx900` or 
> something like that).

I've been planning to improve `--offload` at some point. When using the OpenMP 
toolchain we have `-fopenmp-target=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda` for 
example, which will just active those toolchains and default to whatever 
`nvptx-arch` and `amdgpu-arch` spit out. We can most likely use similar logic 
if needed. The OpenMP solution to target specific arguments is 
`-Xopenmp-target=amdgcn-amd-amdhsa -march=`, though that's not necessarily the 
best solution.

https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [clang] [llvm] [clang-tools-extra] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2023-12-29 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Some quick nits, will look more later.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [clang-tools-extra] [llvm] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2023-12-29 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,21 @@
+//=== Profiling.h - OpenMP interface -- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+//
+//===--===//
+
+#ifndef OMPTARGET_DEVICERTL_PROFILING_H
+#define OMPTARGET_DEVICERTL_PROFILING_H
+
+extern "C" {
+
+void __llvm_profile_register_function(void *ptr);
+void __llvm_profile_register_names_function(void *ptr, long int i);
+}
+

jhuber6 wrote:

```suggestion
void __llvm_profile_register_function(void *Ptr);
void __llvm_profile_register_names_function(void *Ptr, uint64_t I);

}

```

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] [clang] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2023-12-29 Thread Joseph Huber via cfe-commits



@@ -428,13 +428,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName,
   return VarName;
 }
 
+bool isGPUProfTarget(const Module ) {
+  const auto  = M.getTargetTriple();
+  return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 ||
+ triple.rfind("r600", 0) == 0;
+}
+

jhuber6 wrote:

```suggestion
bool isGPUProfTarget(const Module ) {
  const llvm::Triple  = M.getTargetTriple();
  return Triple.isAMDGPU() || Triple.isNVPTX()
}
```
Standard way looks like this. Side note, we really need a way to express this 
in a more re-usable way especially with SYCL looming. @arsenm should be make 
some common interface in `CodeGenModule` that just returns if we're currently 
targeting a "GPU like" device?

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] [clang-tools-extra] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2023-12-29 Thread Joseph Huber via cfe-commits



@@ -959,8 +959,14 @@ void CodeGenPGO::emitCounterIncrement(CGBuilderTy 
, const Stmt *S,
 
   unsigned Counter = (*RegionCounterMap)[S];
 
-  llvm::Value *Args[] = {FuncNameVar,
- Builder.getInt64(FunctionHash),
+  // Make sure that pointer to global is passed in with zero addrspace
+  // This is relevant during GPU profiling
+  auto *I8Ty = llvm::Type::getInt8Ty(CGM.getLLVMContext());
+  auto *I8PtrTy = llvm::PointerType::getUnqual(I8Ty);
+  auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
+  FuncNameVar, I8PtrTy);
+

jhuber6 wrote:

```suggestion
  auto *NormalizedPtr = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
  FuncNameVar, llvm::PointerType::getUnqual(CGM.getLLVMContext());

```
LLVM uses opaque pointers for everything now.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang-tools-extra] [openmp] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2023-12-29 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2023-12-29 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Is the approach taken in this approach acceptable as opposed to the header 
> solution I put up earlier?

Yes, it's pretty much exactly what I had in mind from my suggestion in the last 
PR. Thanks.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2023-12-29 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Needs a test. There should be some difference in codegen we can key off of.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [Clang][OpenMP] Fix mapping of structs to device (PR #75642)

2023-12-21 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> This fails for me on the host and the AMD GPU: GPU:
> 
> ```
> # | :217:1: note: possible intended match here
> # | dat.datum[dat.arr[0][0]] = 5
> ```
> 
> X86:
> 
> ```
> # | :134:1: note: possible intended match here
> # | dat.datum[dat.arr[0][0]] = 5461
> ```
> 
> The location that is printed (datum[1]) is uninitialized.

I see the same but forgot to say anything.

https://github.com/llvm/llvm-project/pull/75642
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Forward more arguments to the CPU offloading linker (PR #75757)

2023-12-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/75757

Summary:
The CPU target currently inherits all the libraries from the normal link
job to ensure that it has access to the same envrionment that the host
does. However, this previously was not respecting argument libraries
that are passed by name rather than `-l` as well as the whole archive
flags. This patch fixes this to allow the CPU linker to correctly pick
up the libraries associated with things like address sanitizers.

Fixes: https://github.com/llvm/llvm-project/issues/75651


>From 0ba0fa00af551bf8d9f69bec5742bbe4e12a4b58 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 17 Dec 2023 18:24:46 -0600
Subject: [PATCH] [LinkerWrapper] Forward more arguments to the CPU offloading
 linker

Summary:
The CPU target currently inherits all the libraries from the normal link
job to ensure that it has access to the same envrionment that the host
does. However, this previously was not respecting argument libraries
that are passed by name rather than `-l` as well as the whole archive
flags. This patch fixes this to allow the CPU linker to correctly pick
up the libraries associated with things like address sanitizers.

Fixes: https://github.com/llvm/llvm-project/issues/75651
---
 clang/test/Driver/linker-wrapper.c|  6 ++--
 .../ClangLinkerWrapper.cpp| 30 +++
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index b763a003452ba7..e51c5ea381d31a 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -49,10 +49,12 @@
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o 
-fembed-offload-object=%t.out
+// RUN: llvm-ar rcs %t.a %t.o
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
-// RUN:   --linker-path=/usr/bin/ld.lld -- %t.o -o a.out 2>&1 | FileCheck %s 
--check-prefix=CPU-LINK
+// RUN:   --linker-path=/usr/bin/ld.lld -- --whole-archive %t.a 
--no-whole-archive \
+// RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CPU-LINK
 
-// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu 
-march=native -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared
+// CPU-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu 
-march=native -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o -Wl,-Bsymbolic -shared 
-Wl,--whole-archive {{.*}}.a -Wl,--no-whole-archive
 
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o
 // RUN: clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu 
-mllvm -openmp-opt-disable \
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index bebe76355eb46f..122ba1998eb83f 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -396,11 +396,31 @@ Expected clang(ArrayRef InputFiles, 
const ArgList ) {
 CmdArgs.push_back("-Wl,-Bsymbolic");
 CmdArgs.push_back("-shared");
 ArgStringList LinkerArgs;
-for (const opt::Arg *Arg : Args.filtered(OPT_library, OPT_library_path))
-  Arg->render(Args, LinkerArgs);
-for (const opt::Arg *Arg : Args.filtered(OPT_rpath))
-  LinkerArgs.push_back(
-  Args.MakeArgString("-Wl,-rpath," + StringRef(Arg->getValue(;
+for (const opt::Arg *Arg :
+ Args.filtered(OPT_INPUT, OPT_library, OPT_library_path, OPT_rpath,
+   OPT_whole_archive, OPT_no_whole_archive)) {
+  // Sometimes needed libraries are passed by name, such as when using
+  // sanitizers. We need to check the file magic for any libraries.
+  if (Arg->getOption().matches(OPT_INPUT)) {
+if (!sys::fs::exists(Arg->getValue()) ||
+sys::fs::is_directory(Arg->getValue()))
+  continue;
+
+file_magic Magic;
+if (auto EC = identify_magic(Arg->getValue(), Magic))
+  return createStringError(inconvertibleErrorCode(),
+   "Failed to open %s", Arg->getValue());
+if (Magic != file_magic::archive &&
+Magic != file_magic::elf_shared_object)
+  continue;
+  }
+  if (Arg->getOption().matches(OPT_whole_archive))
+LinkerArgs.push_back(Args.MakeArgString("-Wl,--whole-archive"));
+  else if (Arg->getOption().matches(OPT_no_whole_archive))
+LinkerArgs.push_back(Args.MakeArgString("-Wl,--no-whole-archive"));
+  else
+Arg->render(Args, LinkerArgs);
+}
 llvm::copy(LinkerArgs, std::back_inserter(CmdArgs));
   }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[clang] Revert "[LinkerWrapper] Add 'Freestanding' config to the LTO pass" (PR #75528)

2023-12-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/75528
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Revert "[LinkerWrapper] Add 'Freestanding' config to the LTO pass" (PR #75528)

2023-12-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/75528
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [lldb] [libc] [compiler-rt] [clang-tools-extra] [lld] [llvm] [libcxx] [openmp] Gcc 75 libomptarget type convert (PR #75562)

2023-12-15 Thread Joseph Huber via cfe-commits



@@ -47,7 +47,9 @@ PluginAdaptorTy::create(const std::string ) {
   new PluginAdaptorTy(Name, std::move(LibraryHandler)));
   if (auto Err = PluginAdaptor->init())
 return Err;
-  return PluginAdaptor;

jhuber6 wrote:

Does putting `std::move` here not work?

https://github.com/llvm/llvm-project/pull/75562
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Joseph Huber via cfe-commits



@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;

jhuber6 wrote:

No, it would just override the flag before it. E.g. `-fopenmp-force-usm 
-fno-openmp-force-usm` would return to not having it on.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Joseph Huber via cfe-commits



@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], 
"fopenmp-cuda-blocks-per-sm=">
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
 def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], 
"fopenmp-cuda-teams-reduction-recs-num=">, Group,
   Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group,
+  Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;

jhuber6 wrote:

`-f` options tend to have a `-fno` variant as well.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Joseph Huber via cfe-commits



@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));

jhuber6 wrote:

Here's the patch for `-fopenmp-offload-mandatory` which is a similar use-case 
https://reviews.llvm.org/D120353.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)

2023-12-14 Thread Joseph Huber via cfe-commits



@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList 
) const {
 void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
 const ArgList , ArgStringList ) const {
   HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+  CC1Args.push_back("-internal-isystem");
+  SmallString<128> P(HostTC.getDriver().ResourceDir);
+  llvm::sys::path::append(P, "include/cuda_wrappers");
+  CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+  // Force USM mode will forcefully include #pragma omp requires
+  // unified_shared_memory via the force_usm header
+  // XXX This may result in a compilation error if the source
+  // file already includes that pragma.
+  if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+CC1Args.push_back("-include");
+CC1Args.push_back(
+DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));

jhuber6 wrote:

I don't think this is a good way to handle this. We should make this a CC1 
argument, forward it in the standard way, and make `CGOpenMPRuntime` always 
emit the associated runtime call.

Also note that I'm planning on removing the current "requires" handling because 
emitting spurious global constructors into the runtime is difficult to work 
around.

https://github.com/llvm/llvm-project/pull/75468
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2023-12-13 Thread Joseph Huber via cfe-commits



@@ -209,6 +210,13 @@ void AMDGCN::Linker::ConstructJob(Compilation , const 
JobAction ,
   if (JA.getType() == types::TY_LLVM_BC)
 return constructLlvmLinkCommand(C, JA, Inputs, Output, Args);
 
+  if (Args.getLastArgValue(options::OPT_mcpu_EQ) == "generic") {
+llvm::opt::ArgStringList TrArgs{"--spirv-max-version=1.1",

jhuber6 wrote:

I wonder if `-mcpu` is the correct way to encode this. Targeting SPIR-V is more 
like the triple than the architecture as far as I'm aware.

https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LLVM] Add file magic detection for SPIR-V files. (PR #75363)

2023-12-13 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Added a test, for whatever reason I had to do a completely clean build to get 
the test to correctly pick up my changes to `Magic.cpp`, don't know why.

https://github.com/llvm/llvm-project/pull/75363
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LLVM] Add file magic detection for SPIR-V files. (PR #75363)

2023-12-13 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/75363

>From 2700151916b0fd91c793930127412af5690c9e41 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 13 Dec 2023 11:35:13 -0600
Subject: [PATCH 1/2] [LLVM] Add file magic detection for SPIR-V files.

Summary:
More SPIR-V related patches are being upstreamed. We should add support
to detect when a binary file is SPIR-V. This will be used in the future
when support for SPIR-V is added to the offloading runtime or more
support for bundling.

The magic number is described in the official documentation:
https://registry.khronos.org/SPIR-V/specs/1.0/SPIRV.html#Magic. Notably,
SPIR-V files are streams of 32-bit words. This means that the magic
numbers differ depending on the endianness. Here we simply check the
strandard and byte-reversed versions.
---
 llvm/include/llvm/BinaryFormat/Magic.h | 1 +
 llvm/lib/BinaryFormat/Magic.cpp| 9 +
 llvm/lib/Object/Binary.cpp | 1 +
 llvm/lib/Object/ObjectFile.cpp | 1 +
 4 files changed, 12 insertions(+)

diff --git a/llvm/include/llvm/BinaryFormat/Magic.h 
b/llvm/include/llvm/BinaryFormat/Magic.h
index a28710dcdfaf2c..c635a269576587 100644
--- a/llvm/include/llvm/BinaryFormat/Magic.h
+++ b/llvm/include/llvm/BinaryFormat/Magic.h
@@ -57,6 +57,7 @@ struct file_magic {
 dxcontainer_object,///< DirectX container file
 offload_bundle,///< Clang offload bundle file
 offload_bundle_compressed, ///< Compressed clang offload bundle file
+spirv_object,  ///< A binary SPIR-V file
   };
 
   bool is_object() const { return V != unknown; }
diff --git a/llvm/lib/BinaryFormat/Magic.cpp b/llvm/lib/BinaryFormat/Magic.cpp
index 255937a5bdd04a..b0f0043f0e492a 100644
--- a/llvm/lib/BinaryFormat/Magic.cpp
+++ b/llvm/lib/BinaryFormat/Magic.cpp
@@ -72,6 +72,15 @@ file_magic llvm::identify_magic(StringRef Magic) {
   case 0x03:
 if (startswith(Magic, "\x03\xF0\x00"))
   return file_magic::goff_object;
+// SPIR-V format in little-endian mode.
+if (startswith(Magic, "\x03\x02\x23\x07"))
+  return file_magic::spirv_object;
+break;
+
+  case 0x07:
+// SPIR-V format in big-endian mode.
+if (startswith(Magic, "\x07\x23\x02\x03"))
+  return file_magic::spirv_object;
 break;
 
   case 0x10:
diff --git a/llvm/lib/Object/Binary.cpp b/llvm/lib/Object/Binary.cpp
index 0ee9f7fac448a2..0b9d95485287dc 100644
--- a/llvm/lib/Object/Binary.cpp
+++ b/llvm/lib/Object/Binary.cpp
@@ -89,6 +89,7 @@ Expected> 
object::createBinary(MemoryBufferRef Buffer,
   case file_magic::dxcontainer_object:
   case file_magic::offload_bundle:
   case file_magic::offload_bundle_compressed:
+  case file_magic::spirv_object:
 // Unrecognized object file format.
 return errorCodeToError(object_error::invalid_file_type);
   case file_magic::offload_binary:
diff --git a/llvm/lib/Object/ObjectFile.cpp b/llvm/lib/Object/ObjectFile.cpp
index 428166f58070d0..ca921836b7f65a 100644
--- a/llvm/lib/Object/ObjectFile.cpp
+++ b/llvm/lib/Object/ObjectFile.cpp
@@ -160,6 +160,7 @@ ObjectFile::createObjectFile(MemoryBufferRef Object, 
file_magic Type,
   case file_magic::dxcontainer_object:
   case file_magic::offload_bundle:
   case file_magic::offload_bundle_compressed:
+  case file_magic::spirv_object:
 return errorCodeToError(object_error::invalid_file_type);
   case file_magic::tapi_file:
 return errorCodeToError(object_error::invalid_file_type);

>From 758de880dbd853c37a1e9abb72a1cb0624c4247d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 13 Dec 2023 14:22:52 -0600
Subject: [PATCH 2/2] Add test

---
 clang/tools/clang-offload-packager/ClangOffloadPackager.cpp | 6 ++
 llvm/lib/BinaryFormat/Magic.cpp | 3 +--
 llvm/unittests/BinaryFormat/TestFileMagic.cpp   | 6 ++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp 
b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
index 08de3f3a3771c1..87e491e057e844 100644
--- a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
+++ b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
@@ -234,6 +234,12 @@ int main(int argc, const char **argv) {
 return EXIT_SUCCESS;
   }
 
+  const char spirv_object_le[] =  "\x03\x02\x23\x07";
+
+  StringRef str(spirv_object_le, sizeof(spirv_object_le));
+  llvm::errs() << str.size() << "\n";
+  llvm::errs() << identify_magic(str) << "\n";
+
   PackagerExecutable = argv[0];
   auto reportError = [argv](Error E) {
 logAllUnhandledErrors(std::move(E), WithColor::error(errs(), argv[0]));
diff --git a/llvm/lib/BinaryFormat/Magic.cpp b/llvm/lib/BinaryFormat/Magic.cpp
index b0f0043f0e492a..45a0b7e11452b4 100644
--- a/llvm/lib/BinaryFormat/Magic.cpp
+++ b/llvm/lib/BinaryFormat/Magic.cpp
@@ -77,8 +77,7 @@ file_magic llvm::identify_magic(StringRef Magic) {
   return file_magic::spirv_object;

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2023-12-13 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Perhaps we should consider prefixing it in some way (e.g. `hip-spirv` or 
> `amd-spirv`) that leaves the door open for some special handling (enable a 
> particular set of extensions only for amdgpu targeting SPIRV, try to deal 
> with missing builtins etc.) / flexibility?

Unsure that's necessary, as we'd already have `OFK_HIP` in the clang driver or 
`-x hip` in `cci` to key off of if needed.

https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2023-12-13 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I feel like we should treat `spirv` in the same way we handle stuff like 
`sm_90` in the `CudaArch` enum. (We should probably also rename that as it's 
used for generic offloading now). OpenMP infers the triple from the arch, so in 
the future when OpenMP can handle SPIR-V we can simply re-use that.

https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2023-12-13 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Is generic the best name here? I feel like that's going to be heavily 
> overloaded. I'd much prefer a new architecture that just treats "SPIR-V" as a 
> single architecture. E.g. `--offload-arch=spirv` or something.


https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add SPIRV support to HIPAMD toolchain (PR #75357)

2023-12-13 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Is generic the best name here? I feel like that's going to be heavily 
overloaded.

https://github.com/llvm/llvm-project/pull/75357
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Sema] atomic_compare_exchange: check failure memory order (PR #74959)

2023-12-12 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Did this change anything for the `scoped_atomic_compare_exchange_n` variant I 
added recently?

https://github.com/llvm/llvm-project/pull/74959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] ef23bba - [Linkerwrapper] Make -Xoffload-linker pass directly to `clang`

2023-12-11 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2023-12-11T07:56:19-06:00
New Revision: ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814

URL: 
https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814
DIFF: 
https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814.diff

LOG: [Linkerwrapper] Make -Xoffload-linker pass directly to `clang`

Summary:
We provide `-Xoffload-linker` to pass arguments directly to the link
step. Currently this uses `-Wl,` implicitly which prevents us from using
clang options that we otherwise could make use of. This patch removes
that implicit behavior as users can just as easiliy pass
`-Xoffload-linker -Wl,-foo` if needed.

Added: 


Modified: 
clang/test/Driver/linker-wrapper.c
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e82febd618231..b763a003452ba 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -123,8 +123,8 @@
 // RUN:   --linker-path=/usr/bin/ld --device-linker=a 
--device-linker=nvptx64-nvidia-cuda=b -- \
 // RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=LINKER-ARGS
 
-// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}-Wl,a
-// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}-Wl,a -Wl,b
+// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}a
+// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}a b
 
 // RUN: not clang-linker-wrapper --dry-run 
--host-triple=x86_64-unknown-linux-gnu -ldummy \
 // RUN:   --linker-path=/usr/bin/ld --device-linker=a 
--device-linker=nvptx64-nvidia-cuda=b -- \

diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index db0ce3e2a1901..5d2fe98fe5601 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -428,7 +428,7 @@ Expected clang(ArrayRef InputFiles, 
const ArgList ) {
 std::back_inserter(CmdArgs));
 
   for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ))
-CmdArgs.push_back(Args.MakeArgString("-Wl," + Arg));
+CmdArgs.push_back(Args.MakeArgString(Arg));
 
   for (StringRef Arg : Args.getAllArgValues(OPT_builtin_bitcode_EQ)) {
 if (llvm::Triple(Arg.split('=').first) == Triple)



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 1519 matches

Mail list logo