[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

Artem Belevich via cfe-commits Tue, 04 Feb 2025 15:21:32 -0800

================
@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"
----------------
Artem-B wrote:


You're probably the one holding most of the puzzle pieces here. For CUDA/HIP 
things (mostly) work well enough. Even though there are somewhat useful 
attempts to target spir-V, they are half-baked at best. For OpenMP the target 
set is widely open, so I can see how it would need a more flexible way to 
customize the compilation pipeline construction. 

In abstract, part of the problem boils down to naming. You need to be able to 
tell the driver, what we want to target (currently we use --offload-arch= and 
-offload options for that) and how to select subsets of the constructed 
pipeline (-X<host|device|arch|something else>).

`--offload-arch=amdgcnspirv,gfx1030` looks like a reasonable approach to me, 
and the same naming scheme could be used as a selector part of `-Xarch`. What 
we have now may not be perfect, but I think it's reasonably functional. Is 
there a particular reason `--offload-arch=amdgcnspirv,gfx1030` is not 
sufficient to drive pipeline construction? I'm not sure why we want -Xarch to 
do that job.

> Compared to those I really don't think -Xarch_ is a big deal, since it does 
> what you want, passes those arguments only to the Triple toolchain.

My issue is that the patch was making -Xarch do the job it's not intended for 
*and* hiding real errors in the process. The "passing the arguments to the 
triple{-selected} toolchain"  (as an wider-scope selector, compared to per-GPU 
or host/device ones we have now) part is fine. Let's handle pipeline creation 
challenge separately.

https://github.com/llvm/llvm-project/pull/125421
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

Reply via email to