[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-18 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: llvm/lib/OffloadArch/OffloadArch.cpp:280
+  return results;
+}

the _aot_ names are not great.



Comment at: llvm/lib/OffloadArch/amdgpu/hsa-subset.h:40
+// DEALINGS WITH THE SOFTWARE.
+//
+

licence is wrong.



Comment at: llvm/lib/OffloadArch/offload-arch/offload-arch.cpp:93
+\n\
+");
+  exit(1);

This is not AMD.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

T




Comment at: llvm/lib/OffloadArch/amdgpu/vendor_specific_capabilities.cpp:25
+//
+#include "hsa-subset.h"
+#include 

yaxunl wrote:
> It would be much simpler to use HIP API to get device name and capabilities 
> e.g. gfx906:xnack+:sramecc-
> 
> https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.2.x/samples/1_Utils/hipInfo/hipInfo.cpp
> 
> It will work on both Linux and Windows. On Linux the availability of HIP 
> runtime is the same as HSA runtime. On Windows HIP runtime is shipped with 
> display driver, whereas HSA runtime is not available.
> On Linux the availability of HIP runtime is the same as HSA runtime

This is probably not true. If ROCm is installed somewhere, both HIP and HSA 
runtimes are available. If building from source, HSA is much quicker and easier 
to build than the HIP runtimes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

This only works on Linux. So either to make it work on both Linux and Windows, 
or restrict it to Linux in CMakeLists.txt, otherwise it breaks LLVM build on 
Windows.




Comment at: llvm/lib/OffloadArch/OffloadArch.cpp:17
+#include "llvm/Support/WithColor.h"
+#include 
+#include 

better to use LLVM or standard C++ functions for directory operations since 
dirent.h is not available in MSVC. Even though this utility only works on Linux 
for now, it is better to be platform-neutral to be ported to Windows.



Comment at: llvm/lib/OffloadArch/amdgpu/vendor_specific_capabilities.cpp:25
+//
+#include "hsa-subset.h"
+#include 

It would be much simpler to use HIP API to get device name and capabilities 
e.g. gfx906:xnack+:sramecc-

https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.2.x/samples/1_Utils/hipInfo/hipInfo.cpp

It will work on both Linux and Windows. On Linux the availability of HIP 
runtime is the same as HSA runtime. On Windows HIP runtime is shipped with 
display driver, whereas HSA runtime is not available.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D106960#2925610 , @ye-luo wrote:

> my second GPU is NVIDIA 3060Ti (sm_86)
> I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80.
>
> About sm_80 binary able ot run on sm_86
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#application-compatibility-on-ampere

Keep in mind that the binaries compiled for sm_80 will likely run a lot slower 
on sm_86. sm_86 has distinctly different hardware and the code generated for 
sm_80 will be sub-optimal for it.
I don't have the Ampere cards to compare, but sm_70 binaries running on sm_75 
were reached only about 1/2 of the speed of the same code compiled for sm_75 
when it was operating on fp16.

NVIDIA didn't provide performance tuning guide for Ampere, but here's what it 
had to say about Volta/Turing:
https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#tensor-operations

> Any binary compiled for Volta will run on Turing, but Volta binaries using 
> Tensor Cores will only be able to reach half of Turing's Tensor Core peak 
> performance. 
> Recompiling the binary specifically for Turing would allow it to reach the 
> peak performance.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106960: [OffloadArch] Library to query properties of current offload archicture

2021-08-04 Thread Ye Luo via Phabricator via cfe-commits
ye-luo added a comment.

I testing with aomp 13.0-5 on ubuntu 20.04.2 LTS (Focal Fossa)

  yeluo@epyc-server:~$ offload-arch -a
  gfx906
  ERROR: offload-arch not found for 10de:2486.
  yeluo@epyc-server:~$ offload-arch -c
  gfx906   sramecc+ xnack-
  yeluo@epyc-server:~$ offload-arch -n
  gfx906 1002:66AF

my second GPU is NVIDIA 3060Ti (sm_86)
I build my app daily with -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80.

About sm_80 binary able ot run on sm_86
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#application-compatibility-on-ampere


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106960/new/

https://reviews.llvm.org/D106960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits