[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-26 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added a comment.

In D140158#4082804 , @jhuber6 wrote:

> In D140158#4082789 , @alexfh wrote:
>
>> This patch breaks our cuda compilations. The output file isn't created after 
>> it:
>>
>>   $ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
>>   $ good-clang \
>>   -nocudainc -x cuda \
>>   --cuda-path=somepath/cuda/ \
>>   -Wno-unknown-cuda-version --cuda-device-only \
>>   -c q.cc -o qqq-good.o \
>> && bad-clang \
>>   -nocudainc -x cuda \
>>   --cuda-path=somepath/cuda/ \
>>   -Wno-unknown-cuda-version --cuda-device-only \
>>   -c q.cc -o qqq-bad.o \
>> && ls qqq*.o
>>   qqq-good.o
>
> https://github.com/llvm/llvm-project/issues/60301 Still broken after this fix?

That works. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-26 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D140158#4082789 , @alexfh wrote:

> This patch breaks our cuda compilations. The output file isn't created after 
> it:
>
>   $ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
>   $ good-clang \
>   -nocudainc -x cuda \
>   --cuda-path=somepath/cuda/ \
>   -Wno-unknown-cuda-version --cuda-device-only \
>   -c q.cc -o qqq-good.o \
> && bad-clang \
>   -nocudainc -x cuda \
>   --cuda-path=somepath/cuda/ \
>   -Wno-unknown-cuda-version --cuda-device-only \
>   -c q.cc -o qqq-bad.o \
> && ls qqq*.o
>   qqq-good.o

https://github.com/llvm/llvm-project/issues/60301 Still broken after this fix?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-26 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added a comment.

This patch breaks our cuda compilations. The output file isn't created after it:

  $ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
  $ good-clang \
  -nocudainc -x cuda \
  --cuda-path=somepath/cuda/ \
  -Wno-unknown-cuda-version --cuda-device-only \
  -c q.cc -o qqq-good.o \
&& bad-clang \
  -nocudainc -x cuda \
  --cuda-path=somepath/cuda/ \
  -Wno-unknown-cuda-version --cuda-device-only \
  -c q.cc -o qqq-bad.o \
&& ls qqq*.o
  qqq-good.o


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
jhuber6 marked 3 inline comments as done.
Closed by commit rG0660397e6809: [CUDA] Allow targeting NVPTX directly without 
a host toolchain (authored by jhuber6).

Changed prior to commit:
  https://reviews.llvm.org/D140158?vs=490317=490329#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,68 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' if produced internally.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we emit '.o' files if compiled with '-c'
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -c -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=OBJECT %s
+
+//  OBJECT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// OBJECT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[OBJ:.+]].o" "[[PTX]].s" "-c"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we copy input '.o' files to '.cubin' files when linking.
+//
+// RUN: touch %t.o
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINK %s
+
+// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -95,6 +95,19 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+public:
+  FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+  bool hasIntegratedCPP() const override { return false; }
+
+  void ConstructJob(Compilation , const JobAction ,
+const InputInfo , const InputInfoList ,
+const llvm::opt::ArgList ,
+const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
 public:
   Linker(const 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked 3 inline comments as done.
jhuber6 added a comment.

In D140158#4063720 , @tra wrote:

> LGTM with few minor nits and questions.
>
> In D140158#4063689 , @jhuber6 wrote:
>
>> Addressing some comments. I don't know if there's a cleaner way to mess 
>> around with the `.cubin` nonsense. I liked symbolic links but that doesn't 
>> work on Windows.
>
> Can we do a copy or rename instead. That could be more portable.

I do a copy in this patch already. The problem is that if it's an internal 
link, e.g. `clang --target=nvptx64-nvidia-cuda foo.c bar.c`, then there will be 
no file to copy. The symbol link got around this by linking the files so once 
the `.o` was written the contents would automatically appear in the `.cubin`. 
The other approach is to detect this and use the correct names which is what 
this patch does, albeit a little jankily.




Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:518-521
+const InputInfo ,
+const InputInfoList ,
+const ArgList ,
+const char *LinkingOutput) const {

tra wrote:
> Nit: Looks like there are tabs.
I think that's just the diff showing that it was indented, there's no tabs in 
the file.



Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450
+  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
+  // Checking if the output is a temporary is the cleanest way to determine
+  // this. Putting this logic in `getInputFIlename` isn't an option because it

tra wrote:
> jhuber6 wrote:
> > tra wrote:
> > > Can't say that I like this approach. It heavily relies on "happens to 
> > > work".
> > > 
> > > Perhaps a better way to deal with this is to create the temporary GPU 
> > > object file name with ".cubin" extension to start with.
> > > 
> > > 
> > > 
> > As far as I know, the files are created independently of the tool-chains so 
> > I'm not sure if we'd be able to check there. The current way is to use 
> > `getInputFilename` but that doesn't have access to the compilation. As far 
> > as I can come up with there's the following solutions
> > 
> > - Do this and check the temp files
> > - Create a symbolic link if the file is empty (Doesn't work on Windows)
> > - Make a random global that's true if the Linker tool was built at some 
> > point
> Naming is hard. :-(
> 
> OK, let's add a FIXME to this comment and hope to get it fixed when NVIDIA's 
> tools become less obsessive about file extensions. I'll file a bug with them 
> to get the ball rolling on their end.
I would like that. it's a very simple check if you actually know how ELF 
headers work... Also while you're at it, ask for how they set their machine 
flags in Nvidia so we can display the arch via `llvm-readelf`.

I'll add some FIXME's.



Comment at: clang/lib/Driver/ToolChains/Cuda.h:196-197
+
+   void AddCudaIncludeArgs(const llvm::opt::ArgList ,
llvm::opt::ArgStringList ) const override;
 

tra wrote:
> jhuber6 wrote:
> > tra wrote:
> > > Nit: it's hard to tell whether the whitespace additions are spaces or 
> > > tabs. They show up as ">" to me which suggests it may be tabs. Just in 
> > > case it is indeed the case, please make sure to un-tabify the changes.
> > I think it's because the original file was formatted incorrectly, so all my 
> > new changes are formatted correctly. I'm somewhat more tempted to just 
> > clang-format it upstream and rebase it.
> Clang-formatting the file(s) and submitting that change separately before the 
> patch SGTM.
Already did :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.

LGTM with few minor nits and questions.

In D140158#4063689 , @jhuber6 wrote:

> Addressing some comments. I don't know if there's a cleaner way to mess 
> around with the `.cubin` nonsense. I liked symbolic links but that doesn't 
> work on Windows.

Can we do a copy or rename instead. That could be more portable.




Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:518-521
+const InputInfo ,
+const InputInfoList ,
+const ArgList ,
+const char *LinkingOutput) const {

Nit: Looks like there are tabs.



Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450
+  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
+  // Checking if the output is a temporary is the cleanest way to determine
+  // this. Putting this logic in `getInputFIlename` isn't an option because it

jhuber6 wrote:
> tra wrote:
> > Can't say that I like this approach. It heavily relies on "happens to work".
> > 
> > Perhaps a better way to deal with this is to create the temporary GPU 
> > object file name with ".cubin" extension to start with.
> > 
> > 
> > 
> As far as I know, the files are created independently of the tool-chains so 
> I'm not sure if we'd be able to check there. The current way is to use 
> `getInputFilename` but that doesn't have access to the compilation. As far as 
> I can come up with there's the following solutions
> 
> - Do this and check the temp files
> - Create a symbolic link if the file is empty (Doesn't work on Windows)
> - Make a random global that's true if the Linker tool was built at some point
Naming is hard. :-(

OK, let's add a FIXME to this comment and hope to get it fixed when NVIDIA's 
tools become less obsessive about file extensions. I'll file a bug with them to 
get the ball rolling on their end.



Comment at: clang/lib/Driver/ToolChains/Cuda.h:196-197
+
+   void AddCudaIncludeArgs(const llvm::opt::ArgList ,
llvm::opt::ArgStringList ) const override;
 

jhuber6 wrote:
> tra wrote:
> > Nit: it's hard to tell whether the whitespace additions are spaces or tabs. 
> > They show up as ">" to me which suggests it may be tabs. Just in case it is 
> > indeed the case, please make sure to un-tabify the changes.
> I think it's because the original file was formatted incorrectly, so all my 
> new changes are formatted correctly. I'm somewhat more tempted to just 
> clang-format it upstream and rebase it.
Clang-formatting the file(s) and submitting that change separately before the 
patch SGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 490317.
jhuber6 added a comment.

Addressing some comments. I don't know if there's a cleaner way to mess around 
with the `.cubin` nonsense. I liked symbolic links but that doesn't work on 
Windows.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,68 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' if produced internally.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we emit '.o' files if compiled with '-c'
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -c -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=OBJECT %s
+
+//  OBJECT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// OBJECT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[OBJ:.+]].o" "[[PTX]].s" "-c"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we copy input '.o' files to '.cubin' files when linking.
+//
+// RUN: touch %t.o
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINK %s
+
+// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -95,6 +95,19 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+public:
+  FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+  bool hasIntegratedCPP() const override { return false; }
+
+  void ConstructJob(Compilation , const JobAction ,
+const InputInfo , const InputInfoList ,
+const llvm::opt::ArgList ,
+const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
 public:
   Linker(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
@@ -116,7 +129,48 @@
 
 namespace toolchains {
 
-class LLVM_LIBRARY_VISIBILITY 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked an inline comment as done.
jhuber6 added inline comments.



Comment at: clang/lib/Driver/ToolChains/Cuda.h:196-197
+
+   void AddCudaIncludeArgs(const llvm::opt::ArgList ,
llvm::opt::ArgStringList ) const override;
 

tra wrote:
> Nit: it's hard to tell whether the whitespace additions are spaces or tabs. 
> They show up as ">" to me which suggests it may be tabs. Just in case it is 
> indeed the case, please make sure to un-tabify the changes.
I think it's because the original file was formatted incorrectly, so all my new 
changes are formatted correctly. I'm somewhat more tempted to just clang-format 
it upstream and rebase it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450
+  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
+  // Checking if the output is a temporary is the cleanest way to determine
+  // this. Putting this logic in `getInputFIlename` isn't an option because it

tra wrote:
> Can't say that I like this approach. It heavily relies on "happens to work".
> 
> Perhaps a better way to deal with this is to create the temporary GPU object 
> file name with ".cubin" extension to start with.
> 
> 
> 
As far as I know, the files are created independently of the tool-chains so I'm 
not sure if we'd be able to check there. The current way is to use 
`getInputFilename` but that doesn't have access to the compilation. As far as I 
can come up with there's the following solutions

- Do this and check the temp files
- Create a symbolic link if the file is empty (Doesn't work on Windows)
- Make a random global that's true if the Linker tool was built at some point



Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:469
 
-  bool Relocatable = false;
+  bool Relocatable = true;
   if (JA.isOffloading(Action::OFK_OpenMP))

tra wrote:
> Nit: I'd add an `else { Relocatable = false; // comment on what use cases use 
> relocatable compilation by default. }` and leave it uninitialized here. At 
> the very least it may be worth a comment. Silently defaulting to `true` makes 
> me ask "what other compilation modes we may have?" and stand-alone 
> compilation targeting NVPTX is hard to infer here from the surrounding 
> details.
> 
> 
Works for me.



Comment at: clang/test/Driver/cuda-cross-compiling.c:37-38
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" 
"sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+

tra wrote:
> This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. 
> I think we typically use something like `fatbinary{{.*}}"` in other tests.
Good point.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

LGTM overall, with few nits.




Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450
+  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
+  // Checking if the output is a temporary is the cleanest way to determine
+  // this. Putting this logic in `getInputFIlename` isn't an option because it

Can't say that I like this approach. It heavily relies on "happens to work".

Perhaps a better way to deal with this is to create the temporary GPU object 
file name with ".cubin" extension to start with.






Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:450
+  // Checking if the output is a temporary is the cleanest way to determine
+  // this. Putting this logic in `getInputFIlename` isn't an option because it
+  // relies on the compilation.

typo: getInputFilename



Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:469
 
-  bool Relocatable = false;
+  bool Relocatable = true;
   if (JA.isOffloading(Action::OFK_OpenMP))

Nit: I'd add an `else { Relocatable = false; // comment on what use cases use 
relocatable compilation by default. }` and leave it uninitialized here. At the 
very least it may be worth a comment. Silently defaulting to `true` makes me 
ask "what other compilation modes we may have?" and stand-alone compilation 
targeting NVPTX is hard to infer here from the surrounding details.





Comment at: clang/lib/Driver/ToolChains/Cuda.h:196-197
+
+   void AddCudaIncludeArgs(const llvm::opt::ArgList ,
llvm::opt::ArgStringList ) const override;
 

Nit: it's hard to tell whether the whitespace additions are spaces or tabs. 
They show up as ">" to me which suggests it may be tabs. Just in case it is 
indeed the case, please make sure to un-tabify the changes.



Comment at: clang/test/Driver/cuda-cross-compiling.c:37-38
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" 
"sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+

This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I 
think we typically use something like `fatbinary{{.*}}"` in other tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-16 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

FWIW, creating CUBIN from C/C++ directly would be really useful when debugging 
(and in combination with our soon to be available JIT object loader).

@tra Can we get this in somehow?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2023-01-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 489554.
jhuber6 added a comment.

Updating. Used a different method to determine if we need to use `.cubin` or 
`.o`. It's a little ugly but I don't think there's a better way to do it.

Also I just realized that if this goes through I could probably heavily 
simplify linker wrapper code by just invoking `clang --target=nvptx64` instead 
of calling the tools manually. So that's another benefit to this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,68 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' if produced internally.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we emit '.o' files if compiled with '-c'
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -c -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=OBJECT %s
+
+//  OBJECT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// OBJECT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[OBJ:.+]].o" "[[PTX]].s" "-c"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we copy input '.o' files to '.cubin' files when linking.
+//
+// RUN: touch %t.o
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINK %s
+
+// LINK: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink" "-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -95,9 +95,22 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+ public:
+   FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+   bool hasIntegratedCPP() const override { return false; }
+
+   void ConstructJob(Compilation , const JobAction ,
+ const InputInfo , const InputInfoList ,
+ const llvm::opt::ArgList ,
+ const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 483576.
jhuber6 added a comment.

Accidentally deleted the old `getInputFilename` routine which we need. The
symlink worked fine but would break on Windows, so I ended up writing a hack
that would only use `.cubin` if we have the `nvlink` linker active and the input
comes from the assembler. It's extremely ugly so let me know what you think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,68 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' if produced internally.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we emit '.o' files if compiled with '-c'
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -c -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=OBJECT %s
+
+//  OBJECT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// OBJECT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[OBJ:.+]].o" "[[PTX]].s" "-c"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that we copy input '.o' files to '.cubin' files when linking.
+//
+// RUN: touch %t.o
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=LINK %s
+
+// LINK: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink" "-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -98,9 +98,22 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+ public:
+   FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+   bool hasIntegratedCPP() const override { return false; }
+
+   void ConstructJob(Compilation , const JobAction ,
+ const InputInfo , const InputInfoList ,
+ const llvm::opt::ArgList ,
+ const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
  public:
-   Linker(const ToolChain ) : 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 483507.
jhuber6 added a comment.

Fix format


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,48 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' as well.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink" "-o" "a.out" "-arch" "sm_35" {{.*}} "{{.*}}.cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -98,9 +98,22 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+ public:
+   FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+   bool hasIntegratedCPP() const override { return false; }
+
+   void ConstructJob(Compilation , const JobAction ,
+ const InputInfo , const InputInfoList ,
+ const llvm::opt::ArgList ,
+ const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
  public:
-   Linker(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+   Linker(const ToolChain ) : Tool("NVPTX::Linker", "nvlink", TC) {}
 
bool hasIntegratedCPP() const override { return false; }
 
@@ -119,73 +132,95 @@
 
 namespace toolchains {
 
-class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
-public:
-  CudaToolChain(const Driver , const llvm::Triple ,
-const ToolChain , const llvm::opt::ArgList );
+class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
+ public:
+   NVPTXToolChain(const Driver , const llvm::Triple ,
+  const llvm::Triple ,
+  const llvm::opt::ArgList );
+
+   NVPTXToolChain(const Driver , const llvm::Triple ,
+  const llvm::opt::ArgList );
+
+   llvm::opt::DerivedArgList *
+   TranslateArgs(const llvm::opt::DerivedArgList , StringRef BoundArch,
+ Action::OffloadKind DeviceOffloadKind) const override;
+
+   // Never try to use the integrated assembler with NVPTX; always fork out to
+   // ptxas.
+   bool useIntegratedAs() const override { return false; }
+   bool isCrossCompiling() const override { return true; }
+   bool isPICDefault() const override { return false; }
+   bool isPIEDefault(const 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 483389.
jhuber6 added a comment.

Addressing comments, I did the symbolic link method. It's a stupid hack that's 
only necessary because of nvlink's poor handling but I think this works around 
it well enough.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,48 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' as well.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink" "-o" "a.out" "-arch" "sm_35" {{.*}} "{{.*}}.cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -98,9 +98,22 @@
 
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+ public:
+   FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+   bool hasIntegratedCPP() const override { return false; }
+
+   void ConstructJob(Compilation , const JobAction ,
+ const InputInfo , const InputInfoList ,
+ const llvm::opt::ArgList ,
+ const char *LinkingOutput) const override;
+};
+
+// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
  public:
-   Linker(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+   Linker(const ToolChain ) : Tool("NVPTX::Linker", "nvlink", TC) {}
 
bool hasIntegratedCPP() const override { return false; }
 
@@ -119,43 +132,68 @@
 
 namespace toolchains {
 
-class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
-public:
-  CudaToolChain(const Driver , const llvm::Triple ,
-const ToolChain , const llvm::opt::ArgList );
+class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
+ public:
+   NVPTXToolChain(const Driver , const llvm::Triple ,
+  const llvm::Triple ,
+  const llvm::opt::ArgList );
+
+   NVPTXToolChain(const Driver , const llvm::Triple ,
+  const llvm::opt::ArgList );
+
+   llvm::opt::DerivedArgList *
+   TranslateArgs(const llvm::opt::DerivedArgList , StringRef BoundArch,
+ Action::OffloadKind DeviceOffloadKind) const override;
+
+   // Never try to use the integrated assembler with NVPTX; always fork out to
+   // ptxas.
+   bool useIntegratedAs() const override { 

[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D140158#3999810 , @JonChesterfield 
wrote:

> I don't think we should assume they want implicit behaviour from other 
> programming models thrown in.

Agreed. Also, removing things is often surprisingly hard.
Let's keep things simlpe, get this compilation mode to the point where it's 
practically usable, see what we really want/need, and then make those common 
cases the default or easier to use.

As far as libdevice is concerned, it's only needed for sources that use CUDA 
headers, as those map the standard math functions to their implementations in 
libdevice. Stand-alone sources should not need it. We should also be able to 
compile libdevice.bc into a  libdevice.o and then link with it, if needed, 
during the final executable link phase.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If we do magic header including, we should check for the freestanding argument 
and not include them with that set. I would prefer we not include cuda headers 
into C++ source that isn't being compiled as cuda, and also not link in misc 
cuda library files.

Anyone who has decided to compile raw c or c++ to ptx is doing something off 
the beaten path, I don't think we should assume they want implicit behaviour 
from other programming models thrown in.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D140158#3999783 , @tra wrote:

> In D140158#3999716 , @jhuber6 wrote:
>
>> I just realized the method of copying the `.o` to a `.cubin` doesn't work if 
>> the link step is done in the same compilation because it doesn't exist yet. 
>> To fix this I could either make the tool chain emit `.cubin` if we're going 
>> straight to `nvlink`, or use a symbolic link. The former is uglier, the 
>> latter will probably only work on Linux.
>
> Does it have to be `.cubin` ? Does nvlink require it?

Yes, it's one of the dumbest things in the CUDA toolchain. If the filename is a 
`.o` they assume it's a host file containing embedded RDC-code that needs to be 
combined. If it's a `.cubin` they assume it's device code. I have no clue why 
they couldn't just check the ELF flags, it's trivial.

>> Also do you think I should include the CUDA headers in with this? We can 
>> always get rid of them with `nogpuinc` or similar if they're not needed. The 
>> AMDGPU compilation still links in the device libraries so I'm wondering if 
>> we should at least be consistent.
>
> Only some CUDA headers can be used from C++ and they tend to contain 
> host-only declarations, and stubs or CPU implementations of GPU-side 
> functions in that case.
> The rest rely on CUDA extensions, attributes, So, in this case doing a C++ 
> compilation will give you a host-side view of the headers in the best case, 
> which is not going to be particularly useful.
>
> If we want to make GPU-side functions available, we'll need to have a 
> pre-included wrapper with more preprocessor magic to make CUDA headers 
> usable.  Doing that in a C++ compilation would be questionable as for a C++ 
> compilation a user would normally assume that no headers are pre-included by 
> the compiler.

We might want to at least include the `libdevice` files, most of our wrappers 
definitely won't work without CUDA or OpenMP language modes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D140158#3999716 , @jhuber6 wrote:

> I just realized the method of copying the `.o` to a `.cubin` doesn't work if 
> the link step is done in the same compilation because it doesn't exist yet. 
> To fix this I could either make the tool chain emit `.cubin` if we're going 
> straight to `nvlink`, or use a symbolic link. The former is uglier, the 
> latter will probably only work on Linux.

Does it have to be `.cubin` ? Does nvlink require it?

> Also do you think I should include the CUDA headers in with this? We can 
> always get rid of them with `nogpuinc` or similar if they're not needed. The 
> AMDGPU compilation still links in the device libraries so I'm wondering if we 
> should at least be consistent.

Only some CUDA headers can be used from C++ and they tend to contain host-only 
declarations, and stubs or CPU implementations of GPU-side functions in that 
case.
The rest rely on CUDA extensions, attributes, So, in this case doing a C++ 
compilation will give you a host-side view of the headers in the best case, 
which is not going to be particularly useful.

If we want to make GPU-side functions available, we'll need to have a 
pre-included wrapper with more preprocessor magic to make CUDA headers usable.  
Doing that in a C++ compilation would be questionable as for a C++ compilation 
a user would normally assume that no headers are pre-included by the compiler.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

In D140158#3999716 , @jhuber6 wrote:

> Also do you think I should include the CUDA headers in with this? We can 
> always get rid of them with `nogpuinc` or similar if they're not needed. The 
> AMDGPU compilation still links in the device libraries so I'm wondering if we 
> should at least be consistent.

For OpenMP at least we want this mode to include the device runtime and the 
respective native runtimes as well (except if we do LTO than both are included 
late).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked an inline comment as done.
jhuber6 added a comment.

I just realized the method of copying the `.o` to a `.cubin` doesn't work if 
the link step is done in the same compilation because it doesn't exist yet. To 
fix this I could either make the tool chain emit `.cubin` if we're going 
straight to `nvlink`, or use a symbolic link. The former is uglier, the latter 
will probably only work on Linux.

Also do you think I should include the CUDA headers in with this? We can always 
get rid of them with `nogpuinc` or similar if they're not needed. The AMDGPU 
compilation still links in the device libraries so I'm wondering if we should 
at least be consistent.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

LGTM overall.

So, essentially the patch refactors what we've already been doing for OpenMP 
and made it usable manually, which will be useful for things like GPU-side libc 
tests.




Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:631-632
+
+  const char *Exec =
+  Args.MakeArgString(getToolChain().GetProgramPath("nvlink"));
+  C.addCommand(std::make_unique(

Nit: I'd just do it within the `make_unique` below. It should end up on a line 
by itself and should not hurt readability.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

2022-12-15 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: tra, yaxunl, JonChesterfield.
Herald added subscribers: mattd, gchakrabarti, carlosgalvezp, asavonic.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, MaskRay.
Herald added a reviewer: jdoerfert.
Herald added a project: clang.

Currently, the NVPTX compilation toolchain can only be invoked either
through CUDA or OpenMP via `--offload-device-only`. This is because we
cannot build a CUDA toolchain without an accompanying host toolchain for
the offloading. When using `--target=nvptx64-nvidia-cuda` this results
in generating calls to the GNU assembler and linker, leading to errors.

This patch abstracts the portions of the CUDA toolchain that are
independent of the host toolchain or offloading kind into a new base
class called `NVPTXToolChain`. We still need to read the host's triple
to build the CUDA installation, so if not present we just assume it will
match the host's system for now, or the user can provide the path
explicitly.

This should allow the compiler driver to create NVPTX device images
directly from C/C++ code.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D140158

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c

Index: clang/test/Driver/cuda-cross-compiling.c
===
--- /dev/null
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -0,0 +1,48 @@
+// Tests the driver when targeting the NVPTX architecture directly without a
+// host toolchain to perform CUDA mappings.
+
+// REQUIRES: nvptx-registered-target
+
+//
+// Test the generated phases when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=PHASES %s
+
+//  PHASES: 0: input, "[[INPUT:.+]]", c
+// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
+// PHASES-NEXT: 2: compiler, {1}, ir
+// PHASES-NEXT: 3: backend, {2}, assembler
+// PHASES-NEXT: 4: assembler, {3}, object
+// PHASES-NEXT: 5: linker, {4}, image
+
+//
+// Test the generated bindings when targeting NVPTX.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=BINDINGS %s
+
+//  BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
+// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"
+
+//
+// Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
+// Ensure that the '.o' files are converted to '.cubin' as well.
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=ARGS %s
+
+//  ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+
+//
+// Test the generated arguments default to a value with no architecture. 
+//
+// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=DEFAULT %s
+
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink" "-o" "a.out" "-arch" "sm_35" {{.*}} "{{.*}}.cubin"
Index: clang/lib/Driver/ToolChains/Cuda.h
===
--- clang/lib/Driver/ToolChains/Cuda.h
+++ clang/lib/Driver/ToolChains/Cuda.h
@@ -96,11 +96,25 @@
  const char *LinkingOutput) const override;
 };
 
+// Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
+// assembly into a single output file.
+class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
+ public:
+   FatBinary(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+
+   bool hasIntegratedCPP() const override { return false; }
+
+   void ConstructJob(Compilation , const JobAction ,
+ const InputInfo , const InputInfoList ,
+ const llvm::opt::ArgList ,
+ const char *LinkingOutput) const override;
+};
+
 // Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
 // assembly into a single output file.
 class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
  public:
-   Linker(const ToolChain ) : Tool("NVPTX::Linker", "fatbinary", TC) {}
+   Linker(const ToolChain ) : Tool("NVPTX::Linker", "nvlink", TC)