[PATCH] D147666: [OPENMP] Adds /lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins.

2023-04-24 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers abandoned this revision.
gregrodgers added a comment.

Its ugly, but to avoid requirement to set LD_LIBRARY_PATH for end-users who may 
need LD_LIBRARY_PATH for their own application, we will modify the compiler 
installation with these bash commands where _INSTALL_DIR is the installation 
directory.

  echo "-Wl,-rpath=${_INSTALL_DIR}/lib" > ${_INSTALL_DIR}/bin/rpath.cfg
  echo "-L${_INSTALL_DIR}/lib" >> ${_INSTALL_DIR}/bin/rpath.cfg
  ln -sf rpath.cfg ${_INSTALL_DIR}/bin/clang++.cfg
  ln -sf rpath.cfg ${_INSTALL_DIR}/bin/clang.cfg
  ln -sf rpath.cfg ${_INSTALL_DIR}/bin/flang.cfg
  ln -sf rpath.cfg ${_INSTALL_DIR}/bin/flang-new.cfg

My apologies to linux distro packaging teams.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147666/new/

https://reviews.llvm.org/D147666

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147666: [OPENMP] Adds /lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins.

2023-04-05 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers created this revision.
Herald added subscribers: sunshaoce, guansong, yaxunl.
Herald added a project: All.
gregrodgers requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1, MaskRay.
Herald added a reviewer: jdoerfert.
Herald added a project: clang.

Currently, The user of an OpenMP application needs to set LD_LIBRARY_PATH to 
the directory
that contains the offload plugins. This change adds the rpath for OpenMP 
applications.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D147666

Files:
  clang/lib/Driver/ToolChains/CommonArgs.cpp


Index: clang/lib/Driver/ToolChains/CommonArgs.cpp
===
--- clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -813,7 +813,9 @@
  ArgStringList ) {
   // Enable -frtlib-add-rpath by default for the case of VE.
   const bool IsVE = TC.getTriple().isVE();
-  bool DefaultValue = IsVE;
+  const bool IsOpenMP = (TC.getDriver().getOpenMPRuntime(Args)
+ != Driver::OMPRT_Unknown);
+  bool DefaultValue = IsVE || IsOpenMP ;
   if (!Args.hasFlag(options::OPT_frtlib_add_rpath,
 options::OPT_fno_rtlib_add_rpath, DefaultValue))
 return;
@@ -822,6 +824,16 @@
   if (TC.getVFS().exists(CandidateRPath)) {
 CmdArgs.push_back("-rpath");
 CmdArgs.push_back(Args.MakeArgString(CandidateRPath));
+  } else {
+if (IsOpenMP) {
+  SmallString<256> TopLibPath =
+llvm::sys::path::parent_path(TC.getDriver().Dir);
+  llvm::sys::path::append(TopLibPath, "lib");
+  if (TC.getVFS().exists(TopLibPath)) {
+CmdArgs.push_back("-rpath");
+CmdArgs.push_back(Args.MakeArgString(TopLibPath));
+  }
+}
   }
 }
 


Index: clang/lib/Driver/ToolChains/CommonArgs.cpp
===
--- clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -813,7 +813,9 @@
  ArgStringList ) {
   // Enable -frtlib-add-rpath by default for the case of VE.
   const bool IsVE = TC.getTriple().isVE();
-  bool DefaultValue = IsVE;
+  const bool IsOpenMP = (TC.getDriver().getOpenMPRuntime(Args)
+		  != Driver::OMPRT_Unknown);
+  bool DefaultValue = IsVE || IsOpenMP ;
   if (!Args.hasFlag(options::OPT_frtlib_add_rpath,
 options::OPT_fno_rtlib_add_rpath, DefaultValue))
 return;
@@ -822,6 +824,16 @@
   if (TC.getVFS().exists(CandidateRPath)) {
 CmdArgs.push_back("-rpath");
 CmdArgs.push_back(Args.MakeArgString(CandidateRPath));
+  } else {
+if (IsOpenMP) {
+  SmallString<256> TopLibPath =
+llvm::sys::path::parent_path(TC.getDriver().Dir);
+  llvm::sys::path::append(TopLibPath, "lib");
+  if (TC.getVFS().exists(TopLibPath)) {
+CmdArgs.push_back("-rpath");
+CmdArgs.push_back(Args.MakeArgString(TopLibPath));
+  }
+}
   }
 }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146075: [flang][driver][openmp] Write MLIR for -save-temps

2023-03-23 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers accepted this revision.
gregrodgers added a comment.

This looks good.  Please merge.  I found it very useful especially in the 
context of other generated temp files generated after llvm llinking and 
optimization in the offload driver.  For example just listing the temp files 
with ls -ltr provides a somewhat visual ordering of the compilation flow.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146075/new/

https://reviews.llvm.org/D146075

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D114890: [OpenMP] Make the new device runtime the default

2021-12-02 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers accepted this revision.
gregrodgers added a comment.
This revision is now accepted and ready to land.

this is ok as is


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114890/new/

https://reviews.llvm.org/D114890

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D114890: [OpenMP] Make the new device runtime the default

2021-12-01 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers requested changes to this revision.
gregrodgers added a comment.
This revision now requires changes to proceed.

I forgot to add the "request changes" action.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114890/new/

https://reviews.llvm.org/D114890

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D114890: [OpenMP] Make the new device runtime the default

2021-12-01 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

We want amdgcn to remain on old deviceRTL till we have verified it .   I made 
inline comments on how this could be done.




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5905
   // runtime.
   if (Args.hasFlag(options::OPT_fopenmp_target_new_runtime,
options::OPT_fno_openmp_target_new_runtime,

Add a check for amdgcn something like this. 



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5907
options::OPT_fno_openmp_target_new_runtime,
-   /*Default=*/false))
+   /*Default=*/true))
 CmdArgs.push_back("-fopenmp-target-new-runtime");

This will keep AMDGCN on old runtime till we are ready to switch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114890/new/

https://reviews.llvm.org/D114890

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99235: [HIP] Change to code object v4

2021-05-18 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIP.cpp:116
+  if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4)
+OffloadKind = OffloadKind + "v4";
   for (const auto  : Inputs) {

yaxunl wrote:
> tra wrote:
> > We do not do it for v2/v3. Could you elaborate on what makes v4 special 
> > that it needs its own offload kind? 
> > 
> > Will you need to target different object versions simultaneously?
> > If yes, how? AFAICT, the version specified is currently global and applies 
> > to all sub-compilations.
> > If not, then do we really need to encode the version in the offload target 
> > name?
> Introducing hipv4 is to differentiate with code object version 2 and 3 which 
> are used by HIP applications compiled by older version of clang. ROCm 
> platform is required to keep binary backward compatibility, i.e., old HIP 
> applications built by ROCm 4.0 should run on ROCm 4.1. The bundle ID has 
> different interpretation depending on whether it is version 2/3 or version 4, 
> e.g. 'gfx906' implies xnack and sramecc off with code object v2/3 but implies 
> xnack and sramecc ANY with v4. Since code object version 2/3 uses 'hip', code 
> object version 4 needs to be different, therefore it uses 'hipv4'.
We need to start thinking in terms of offload requirements of a compiled image 
vs the capabilities of a particular active runtime on a particular GPU.   This 
concept can eliminate the need for a new offload kind.  For AMD, we would add 
the requirement of code object v4 (cov4) if built for code object v4 or 
greater.This means it can only run on a system with that capability.  This 
concept works well with requirements xnack+, xnack-, sramecc+ and sramecc-.
The bundle entry id is the offload-kind, the triple, and the list of image 
requirements.  The gpu type (offload-arch) is really an image requirement.  

In this model, there is no requirement for xnack-any.  The lack of the xnack+ 
or xnack- requirement implies "any" which means it can run on any capable 
machine.  

This is a general model that is extensible.   To make this work, a runtime must 
be able to detect the capabilities for any requirement that could be tagged on 
an image.  In fact, every requirement of an embedded image must have its 
capability detected by the runtime for that offload image to be usable.   
However, a system's runtime could have more capabilities than the requirements 
of an image.   So in the case of xnack, the lack of xnack- or xnack+ will be 
acceptable no matter what the xnack capability of the runtime is.   If the 
compiler driver puts the requirement cov4 in the bundle entry id requirements 
field the runtime will not run that image unless the GPU loader supports v4 or 
greater. 

The clang driver can create the requirement xnack- for code object < 4 on those 
GPUs that support either xnack mode.   This will ensure  the image will 
gracefully fail or use an alternative image if the runtime capability is xnack+.

But the cov4 requirement is mostly unrelated to xnack .  It is about the 
capability of the GPU loader.  If the code object version >= 4, then it will be 
tagged with the cov4 requirement.   This would prevent an old system that does 
not have a newer software stack from running an image with a cov4 requirement. 

This general notion of image requirements and runtime capabilities is 
extensible to other offload architectures.   Suppose cuda version 12 
compilation REQUIRES that a cuda version 12 runtime.   Old runtimes would never 
display cuv12 capability and would fail to run any image created with the 
requirement cuv12.
 






Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99235/new/

https://reviews.llvm.org/D99235

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

2021-04-15 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers accepted this revision.
gregrodgers added a comment.
This revision is now accepted and ready to land.

I am removing my objection with the understanding that we will either replace 
or enhance amdgpu-arch with the cross-architecture tool offload-arch as 
described in my comments above.   Also, this patch changes amd toolchains to 
use amdgpu-arch.   So it will not interfere with how I expect the 
cross-architecture tool to be used to greatly simplify openmp command line 
arguments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

2021-04-15 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added inline comments.



Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:9
+
+find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS 
/opt/rocm)
+if (NOT ${hsa-runtime64_FOUND})

What happens when /opt/rocm is not available?   Again, we need a 
cross-architecture mechanism to identify the offload-arch. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

2021-04-14 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

Dependence on hsa is not necessary.  The amdgpu and nvidia drivers both use PCI 
codes available in /sys .  We should use architecture independent methods as 
much as possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

2021-04-14 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers requested changes to this revision.
gregrodgers added a comment.
This revision now requires changes to proceed.

I have two serious concerns with this tool .  1.  It does not provide the 
infrastructure to identify runtime capabilities to satisfy requirements of a 
compiled image.   2.  It is not architecturally neutral or extensible to other 
architectures.  I have a different proposed tool called offload-arch .   Here 
is the help text for offload-arch.

grodgers@r4:/tmp/grodgers/git/aomp13/llvm-project/clang/tools/offload-arch/build$
 ./offload-arch -h

  offload-arch: Print offload architecture(s) for the current active system.
or lookup information about offload architectures
  
  Usage:
offload-arch [ Options ] [ Optional lookup-value ]
  
With no options, offload-arch prints a value for first offload architecture
found in current system.  This value can be used by various clang frontends.
For example, to compile for openmp offloading on current current system
one could invoke clang with the following command:
  
clang -fopenmp -openmp-targets=`offload-arch` foo.c
  
If an optional lookup-value is specified, offload-arch will
check if the value is either a valid offload-arch or a codename
and lookup requested additional information. For example,
this provides all information for offload-arch gfx906:
  
offload-arch gfx906 -v
  
Options:
-h  Print this help message
-a  Print values for all devices. Don't stop at first device found.
-c  Print codename
-n  Print numeric pci-id
-t  Print recommended offload triple.
-v  Verbose = -a -c -n -t
-l  Print long codename found in pci.ids file
-r  Print capabilities of current system to satisfy runtime requirements
of compiled offload images.  This option is used by the runtime to
choose correct image when multiple compiled images are availble.
  
The alias amdgpu-arch returns 1 if no amdgcn GPU is found.
The alias nvidia-arch returns 1 if no cuda GPU is found.
These aliases are useful to determine if architecture-specific tests
should be run. Or these aliases could be used to conditionally load
archecture-specific software.
  
  Copyright (c) 2021 ADVANCED MICRO DEVICES, INC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84743: [Clang][AMDGCN] Universal device offloading macros header

2020-07-29 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

This is all excellent feedback.  Thank  you.  
I don't understand what I see on the godbolt link.  So far, we have only tested 
with clang.  We will test with gcc to understand the fail.  
I will make the change to use numeric values for _DEVICE_ARCH and change 
"UNKNOWN" to some integer (maybe -1).   The value _DEVICE_GPU is intended to be 
generational within a specific _DEVICE_ARCH.   
To be clear, this is primarily for users or library writers to implement device 
specific or host-only code.   This is not something that should be automatic.  
Users or library authors will opt-in with their own include.  So maybe it does 
not belong in clang/lib/Headers.  
As noted in the header comment, I expect the community to help keep this up to 
date as offloading technology evolves.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84743/new/

https://reviews.llvm.org/D84743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D76987: Rename options --cuda-gpu-arch and --no-cuda-gpu-arch

2020-03-30 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

This was discussed on llvm-dev three years ago.  Here is the thread.

http://lists.llvm.org/pipermail/llvm-dev/2017-February/109930.html

The last name discussed was "-- offload-arch".   I don't believe we need a list 
option anymore.   So ignore the very old request for --offload-archs.

I am ok with the patch the way it is.   In the future, we should consider 
renaming the CudaArch class to OffloadArch class .  Also the GpuArchList is 
currently only initialized in CudaActionBuilder.   Eventually this is will have 
to be done for HIPActionBuilder and OpenMPActionBuilder.   Could you consider 
creating a function to  InitializeGpuArchList ?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76987/new/

https://reviews.llvm.org/D76987



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

2018-08-23 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

I have a longer comment on header files,  but let me first understand this 
patch.

IIUC,the concept of this patch is to fake the macros to think it is seeing
a host on the device patch.

if ((LangOpts.CUDA || LangOpts.OpenMPIsDevice) && PP.getAuxTargetInfo())

  InitializePredefinedAuxMacros(*PP.getAuxTargetInfo(), Builder);

That would be counterproductive because well-behaved headers that only
provide optized asm definitions would wrap that asm with

#ifdef __x86_64__

  do some x86 asm function definition

#else

  just provide the function declaration to be implemented somewhere else.

#endif

What am I missing?


Repository:
  rC Clang

https://reviews.llvm.org/D50845



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

2018-08-22 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

I like the idea of using an automatic include as a cc1 option (-include).  
However, I would prefer a more general automatic include for OpenMP, not just 
for math functions (__clang_cuda_device_functions.h). Clang cuda automatically 
includes __clang_cuda_runtime_wrapper.h.  It includes other files as needed 
like __clang_cuda_device_functions.h.  Lets hypothetically call my proposed 
automatic include for OpenMP , __clang_openmp_runtime_wrapper.h.

Just because clang cuda defines functions in __clang_cuda_device_functins.h and 
automatically includes them does not make it right for OpenMP.  In general, 
function definitions in headers should be avoided. The current function 
definitions in __clang_cuda_device_functions.h only work for hostile nv GPUs 
:).  This is how we can avoid function definitions in the headers.  In a new 
openmp build process, we can build libm-nvptx.bc.  This can be done by 
compiling __clang_cuda_device_functions.h as a device-only compile.  Assuming 
current naming conventions, these files would be installed in the same 
directory as libomptarget.so (.../lib).

How do we tell clang cc1 to use this bc library? Use -mlink-builtin-bitcode.   
AddMathDeviceFunctions would then look something like this.

if (this is for device cc1) {

  CC1Args.push_back("-mlink-builtin-bitcode");
  if ( getTriple().isNVPTX())
CC1Args.push_back(DriverArgs.MakeArgString("libm-nvptx.bc"));
  if ( getTriple().getArch() == llvm::Triple::amdgcn);
CC1Args.push_back(DriverArgs.MakeArgString("libm-amdgcn.bc"));

}

You can think of libm-.bc file as the device library equivalent of the 
host libm.so or libm.a.  This concept of "host-consistent" library definitions 
can go beyond math libraries.  In fact, I believe we should co-opt the -l 
(--library) option.  The driver toolchain should look for device bc libraries 
for any -lX command line option.  This gives us a strategy for adding 
user-defined device libraries.

The above code hints at the idea of architecture specific bc files (nvptx vs 
amdgcn).  The nvptx version would call into the cuda libdevice.  For radeon 
processors, we may want processor-optimized versions of the libraries, just 
like there are sub-architecture optimized versions of the cuda libdevice.  If 
we build --cuda-cuda-gpu-arch optimized versions of math bc libs,  then the 
above code will get a bit more complex depending on naming convention of the bc 
lib and the value of
 --cuda-gpu-arch (which should have an alias --offload-arch).

Using a bc lib, significantly reduces the complexity of 
__clang_openmp_runtime_wrapper.h.  We do not not need or see math device 
function definitions or the nv headers that they need. However, it does need to 
correct the behaviour of rogue system headers that define host-optimized 
functions. We can fix this by adding the following to 
__clang_openmp_runtime_wrapper.h so that host passes still get host-optimized 
functions.

#if defined(__AMDGCN__) || defined(__NVPTX__)
#define __NO_INLINE__ 1
#endif

There is a tradeoff to using pre-compiled bc libs.  It makes compile-time macro 
logic hard to implement.  For example,  we cant do this

#if defined(__CLANG_CUDA_APPROX_TRANSCENDENTALS__)
#define __FAST_OR_SLOW(fast, slow) fast
#else
#define __FAST_OR_SLOW(fast, slow) slow
#endif

The openmp build process would either need to build alternative bc libraries 
for each option or a supplemental bc library to address these types of options.
If some option is turned on, then an alternative lib or particular ordering of 
libs would be used to build the clang cc1 command.
For example, the above code for AddMathDeviceFunctions would have this

  ...
  if ( getTriple().isNVPTX()) {
 if (LangOpts.CUDADeviceApproxTranscendentals || LangOpts.FastMath) {
   CC1Args.push_back("-mlink-builtin-bitcode");
   CC1Args.push_back(DriverArgs.MakeArgString("libm-fast-nvptx.bc"));
 }
 CC1Args.push_back("-mlink-builtin-bitcode");
 CC1Args.push_back(DriverArgs.MakeArgString("libm-nvptx.bc"));
  }

I personally believe that pre-built bc libraries with some consistency to their 
host-equivalent libraries is a more sane approach for device libraries than 
complex header logic that is customized for each architecture.


Repository:
  rC Clang

https://reviews.llvm.org/D47849



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48455: Remove hip.amdgcn.bc hc.amdgcn.bc from HIP Toolchains

2018-06-25 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

Why not provide a specific list of --hip-device-lib= for VDI builds?   I am not 
sure about defining functions inside headers instead of using a hip bc lib.


Repository:
  rC Clang

https://reviews.llvm.org/D48455



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42800: Let CUDA toolchain support amdgpu target

2018-02-05 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

Here my replys to the inline comments.   Everything should be fixed in the next 
revision.




Comment at: include/clang/Basic/Cuda.h:79
   COMPUTE_72,
+  COMPUTE_GCN,
 };

t-tye wrote:
> Suggest using amdgcn which matches the architecture name in 
> https://llvm.org/docs/AMDGPUUsage.html#processors .
Yes,  I will add them in the update.



Comment at: include/clang/Basic/Cuda.h:79
   COMPUTE_72,
+  COMPUTE_GCN,
 };

gregrodgers wrote:
> t-tye wrote:
> > Suggest using amdgcn which matches the architecture name in 
> > https://llvm.org/docs/AMDGPUUsage.html#processors .
> Yes,  I will add them in the update.
Done in next update



Comment at: lib/Basic/Targets/AMDGPU.cpp:437
+  case CudaArch::UNKNOWN:
+assert(false && "No GPU arch when compiling CUDA device code.");
+return "";

arsenm wrote:
> llvm_unreachable
Fixed in next update




Comment at: lib/Basic/Targets/AMDGPU.h:85
 return TT.getEnvironmentName() == "amdgiz" ||
+   TT.getOS() == llvm::Triple::CUDA ||
TT.getEnvironmentName() == "amdgizcl";

yaxunl wrote:
> t-tye wrote:
> > We still want to use the amdhsa OS for amdgpu which currently supports the 
> > different environments. So can cuda simply support the same environments? 
> > Is the plan is to eliminate the environments and simply always use the 
> > default address space for generic so this code is no longer needed?
> Currently we already use amdgiz by default. This is no longer needed.
removed in next update



Comment at: lib/Driver/ToolChains/Cuda.cpp:359-361
+  addBCLib(C, Args, CmdArgs, LibraryPaths, "opencl.amdgcn.bc");
+  addBCLib(C, Args, CmdArgs, LibraryPaths, "ockl.amdgcn.bc");
+  addBCLib(C, Args, CmdArgs, LibraryPaths, "irif.amdgcn.bc");

arsenm wrote:
> Why is this done under an NVPTX:: class
Because we are not creating a new toolchain for AMDGCN.  We modify the logic in 
the tool constructor as needed for AMDGCN, keeping the ability to provide a set 
of mixed targets. 



Comment at: lib/Driver/ToolChains/Cuda.cpp:403-415
+  // If Verbose, save input for llc in /tmp and print all symbols
+  if (Args.hasArg(options::OPT_v)) {
+ArgStringList nmArgs;
+nmArgs.push_back(ResultingBitcodeF);
+nmArgs.push_back("-debug-syms");
+const char *nmExec = Args.MakeArgString(C.getDriver().Dir + "/llvm-nm");
+C.addCommand(llvm::make_unique(JA, *this, nmExec, nmArgs, 
Inputs));

Hahnfeld wrote:
> This never gets cleaned up!
OK, Deleted in revision. 



Comment at: lib/Driver/ToolChains/Cuda.cpp:531-534
+  SmallString<256> OutputFileName(Output.getFilename());
+  if (JA.isOffloading(Action::OFK_OpenMP))
+llvm::sys::path::replace_extension(OutputFileName, "cubin");
+  CmdArgs.push_back(Args.MakeArgString(OutputFileName));

Hahnfeld wrote:
> That is already done in `TC.getInputFilename(Output)` (since rC318763), the 
> same function call that you are removing here...
Fixed in next update



Comment at: lib/Driver/ToolChains/Cuda.cpp:639-640
+CmdArgs2.push_back(Args.MakeArgString(Output.getFilename()));
+const char *Exec2 =
+Args.MakeArgString(C.getDriver().Dir + "/clang-fixup-fatbin");
+C.addCommand(

Hahnfeld wrote:
> `clang-fixup-fatbin` is not upstreamed and won't work. Sounds like a horrible 
> name btw...
Major cleanup here in the next update.  It is not a bad name when you see the 
update and the comments in the update. 



Comment at: lib/Driver/ToolChains/Cuda.cpp:788-793
+  // Do not add -link-cuda-bitcode or ptx42 features if gfx
+  for (Arg *A : DriverArgs)
+if (A->getOption().matches(options::OPT_cuda_gpu_arch_EQ) &&
+StringRef(A->getValue()).startswith("gfx"))
+  return;
+

Hahnfeld wrote:
> You should use `GpuArch` which comes from `DriverArgs.getLastArgValue`: The 
> last `-march` overrides previous arguments.
Nice catch.  I will fix this in the next update. 


https://reviews.llvm.org/D42800



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42800: Let CUDA toolchain support amdgpu target

2018-02-01 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers added a comment.

Sorry, all my great inline comments got lost somehow.  I am a newbie to 
Phabricator.  I will try to reconstruct my comments.


https://reviews.llvm.org/D42800



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42800: Let CUDA toolchain support amdgpu target

2018-02-01 Thread Greg Rodgers via Phabricator via cfe-commits
gregrodgers requested changes to this revision.
gregrodgers added a comment.
This revision now requires changes to proceed.

Thanks to everyone for the reviews.   I hope I replied to all inline comments.  
Since I sent this to Sam to post, we discovered a major shortcoming.  As tra 
points out, there is a lot of cuda headers in the cuda sdk that are processed.  
We are able to override asm() expansions with #undef and redefine as an 
equivalent amdgpu component so the compiler never sees the asm().  I am sure we 
will need to add more redefines as we broaden our testing.  But that is not the 
big problem.  We would like to be able to run cudaclang for AMD GPUs without an 
install of cuda.   Of course you must always install cuda if any of your 
targeted GPUs are NVidia GPUs.  To run cudaclang without cuda when only 
non-NVidia gpus are specified, we need an open set of headers and we must 
replace the fatbin tools used in the toolchain.  The later can be addressed by 
using the libomptarget methods for embedding multiple target GPU objects.  The 
former is going to take a lot of work.   I am going to be sending an updated 
patch that has the stubs for the open headers noted in 
__clang_cuda_runtime_wrapper.h.   They will be included with the CC1 flag 
-D__USE_OPEN_HEADERS__.  This will be generated by the cuda driver when it 
finds no cuda installation and all target GPUs are not NVidia.


https://reviews.llvm.org/D42800



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits