[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-08-19 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

Can you document the device binary embedding scheme for multiple GPU's in clang 
documentation? This will help tool developers to develop tools to extract 
device binaries from executables or shared libraries. Also this may help 
interoperability with other offloading language modes in case multiple 
offloading are desired to be supported in one executable or shared library in 
the future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-08-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added reviewers: ronlieb, pdhaliwal.
JonChesterfield added a comment.

Spent some time reading through this. I think the idea is to create a host 
binary that contains code objects for multiple variants of amdgpu - e.g. one 
that runs on gfx906 and another on gfx908, or one that runs on gfx906-xnack+ 
and another on gfx906-xnack-.

That's close to the long running feature request to compile a program to a 
binary that can run on totally different architectures, e.g. nvptx + amdgpu + 
vgpu + remote. Probably in the first instance making one binary that can run on 
whatever and then extending it to run on a system that has multiple targets 
available. I've got a nvptx / amdgpu box here that would be well suited to 
testing that. Tagging Ron and Pushpinder who may be interested in such.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-08-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think this patch needs to split up into a large number of much smaller pieces.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-28 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam added inline comments.



Comment at: openmp/libomptarget/src/rtl.cpp:306
+  std::string cmd_bin;
+  cmd_bin.assign(libomptarget_dir_name).append("/../bin/amdgpu-arch");
+  struct stat stat_buffer;

saiislam wrote:
> Call to amdgpu-arch binary is going to be replaced with call to a new library 
> named OffloadArch. It will return current GPU name along with enabled GPU 
> features (i.e. requirements) in a platform-independent way. As the library 
> and its various functionalities are self-contained I decided to post it is a 
> separate review and use amdgpu-arch here for demonstration.
> I will be posting the phab review for the library soon.
Here is the patch for the OffloadArch library: D106960


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

There seems to be a bunch of different things in this patch.

There's some driver plumbing to compile for more than one arch (presumably by 
calling the target compiler N times). That's a great feature, I want to build 
an application bthat can run on nvptx or amdgpu. Probably need a test case 
showing that combination.

Then there's a bunch of stuff to do with 'requirements', but it's not clear 
what that is.

Finally there's some stuff where libomptarget dlopens itself then spawns 
amdgpu-arch. I can't tell why we would want to do that.

My guess was that each arch would get its own section in the host executable 
containing a code object and each host plugin would be responsible for 
indicating whether it could do anything with a given code object. That should 
work out of the box for machines with only one offloading arch available, and 
need some work around device_id to handle multiple ones.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Ye Luo via Phabricator via cfe-commits
ye-luo added a comment.

In D106870#2907257 , @saiislam wrote:

> In D106870#2907252 , @ye-luo wrote:
>
>> `-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa` seems burdensome. 
>> Could you just count how many `-Xopenmp-target=amdgcn-amd-amdhsa` there are 
>> on the comand line and then count the unique ones?
>
> I have a patch in pipeline which will eliminate need of (-fopenmp-targets, 
> -Xopenmp-target, and -march) altogether. User will be able to compile with 
> just "--offload-arch=gfx906" instead of using the other three flags.
> It is working in our downstream AOMP Compiler but I haven't posted a phab 
> review yet.

That is just a convenient option and separate topic. I'm commenting on the 
current generic option you are fiddle with.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam added inline comments.



Comment at: openmp/libomptarget/src/rtl.cpp:306
+  std::string cmd_bin;
+  cmd_bin.assign(libomptarget_dir_name).append("/../bin/amdgpu-arch");
+  struct stat stat_buffer;

Call to amdgpu-arch binary is going to be replaced with call to a new library 
named OffloadArch. It will return current GPU name along with enabled GPU 
features (i.e. requirements) in a platform-independent way. As the library and 
its various functionalities are self-contained I decided to post it is a 
separate review and use amdgpu-arch here for demonstration.
I will be posting the phab review for the library soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam added a comment.

In D106870#2907252 , @ye-luo wrote:

> `-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa` seems burdensome. 
> Could you just count how many `-Xopenmp-target=amdgcn-amd-amdhsa` there are 
> on the comand line and then count the unique ones?

I have a patch in pipeline which will eliminate need of (-fopenmp-targets, 
-Xopenmp-target, and -march) altogether. User will be able to compile with just 
"--offload-arch=gfx906" instead of using the other three flags.
It is working in our downstream AOMP Compiler but I haven't posted a phab 
review yet.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Ye Luo via Phabricator via cfe-commits
ye-luo added a comment.

`-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa` seems burdensome. Could 
you just count how many `-Xopenmp-target=amdgcn-amd-amdhsa` there are on the 
comand line and then count the unique ones?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam created this revision.
saiislam added reviewers: jdoerfert, yaxunl, JonChesterfield, RaviNarayanaswamy.
Herald added subscribers: kerbowa, pengfei, guansong, nhaehnle, jvesely.
saiislam requested review of this revision.
Herald added subscribers: openmp-commits, cfe-commits, sstefan1.
Herald added projects: clang, OpenMP.

Multiple offloading targets can now be specified in the command
line. An instance of toolchain is created for each unique
combination of Target Triple and Target GPU. Device runtime has
been modified to support binaries containing multiple images,
each for a different target.
Data structure "__tgt_image_info" defined in
"llvm-project/openmp/libomptarget/include/omptarget.h" is used
to pass requirements of each image. E.g. GPU name like gfx906,
sm35, etc are the requirements of the image, which is produced
by clang-offload-wrapper and read by device RTL.

Example:

  clang  -O2  -target x86_64-pc-linux-gnu -fopenmp \
-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 \
   helloworld.c -o helloworld


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106870

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Driver/ToolChain.h
  clang/lib/Driver/Action.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
  clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/lib/Driver/ToolChains/Cuda.h
  clang/test/Driver/amdgpu-openmp-system-arch-fail.c
  clang/test/Driver/amdgpu-openmp-toolchain.c
  clang/test/Driver/hip-rdc-device-only.hip
  clang/test/Driver/hip-toolchain-rdc-separate.hip
  clang/test/Driver/openmp-offload-multi.c
  clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
  openmp/libomptarget/include/omptarget.h
  openmp/libomptarget/src/exports
  openmp/libomptarget/src/interface.cpp
  openmp/libomptarget/src/rtl.cpp

Index: openmp/libomptarget/src/rtl.cpp
===
--- openmp/libomptarget/src/rtl.cpp
+++ openmp/libomptarget/src/rtl.cpp
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 // List of all plugins that can support offloading.
 static const char *RTLNames[] = {
@@ -288,18 +289,131 @@
  flags, RequiresFlags);
 }
 
+/// Query runtime capabilities of this system by calling offload-arch -c
+/// offload_arch_output_buffer is persistant storage returned by this
+/// __tgt_get_active_offload_env.
+static void
+__tgt_get_active_offload_env(__tgt_active_offload_env *active_env,
+ char *offload_arch_output_buffer,
+ size_t offload_arch_output_buffer_size) {
+  void *handle = dlopen("libomptarget.so", RTLD_NOW);
+  if (!handle)
+DP("dlopen() failed: %s\n", dlerror());
+  char *libomptarget_dir_name = new char[PATH_MAX];
+  if (dlinfo(handle, RTLD_DI_ORIGIN, libomptarget_dir_name) == -1)
+DP("RTLD_DI_ORIGIN failed: %s\n", dlerror());
+  std::string cmd_bin;
+  cmd_bin.assign(libomptarget_dir_name).append("/../bin/amdgpu-arch");
+  struct stat stat_buffer;
+  if (stat(cmd_bin.c_str(), _buffer)) {
+DP("Missing offload-arch command at %s \n", cmd_bin.c_str());
+  } else {
+// Add option to print capabilities of current system
+// cmd_bin.append(" -c");
+FILE *stream = popen(cmd_bin.c_str(), "r");
+while (fgets(offload_arch_output_buffer, offload_arch_output_buffer_size,
+ stream) != NULL)
+  ;
+pclose(stream);
+active_env->capabilities = offload_arch_output_buffer;
+size_t slen = strlen(active_env->capabilities);
+offload_arch_output_buffer[slen - 1] =
+'\0'; // terminate string before line feed
+offload_arch_output_buffer +=
+slen; // To store next value in offload_arch_output_buffer, not likely
+  }
+  delete[] libomptarget_dir_name;
+}
+
+std::vector _splitstrings(char *input, const char *sep) {
+  std::vector split_strings;
+  std::string s(input);
+  std::string delimiter(sep);
+  size_t pos = 0;
+  while ((pos = s.find(delimiter)) != std::string::npos) {
+if (pos != 0)
+  split_strings.push_back(s.substr(0, pos));
+s.erase(0, pos + delimiter.length());
+  }
+  if (s.length() > 1)
+split_strings.push_back(s.substr(0, s.length()));
+  return split_strings;
+}
+
+static bool _ImageIsCompatibleWithEnv(__tgt_image_info *img_info,
+  __tgt_active_offload_env *active_env) {
+  // get_image_info will return null if no image information was registered.
+  // If no image information, assume application built with old compiler and
+  // check each image.
+  if (!img_info)
+return true;
+
+  // Each runtime requirement for the compiled image is stored in
+  // the img_info->requirements string and is separated by __ .
+  // Each runtime capability