[PATCH] D159256: [NFC][Clang] Remove redundant function definitions

2023-08-31 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

In D159256#4630915 , @jmmartinez 
wrote:

> In D159256#4630876 , @jhuber6 wrote:
>
>> In D159256#4630410 , @jmmartinez 
>> wrote:
>>
>>> @jhuber6 I was wondering if there is a reason you kept 3 versions of 
>>> `mergeDefaultFunctionDefinitionAttributes` in 
>>> https://reviews.llvm.org/D152391 ?
>>
>> I believe it's because one was a freestanding function, the other was a 
>> member function, and the last was a common implementation.
>
> Would it be ok if I keep only one? It seems that the member function is not 
> used (I was not sure if there was some external code using it).
>
> If not, I can also keep just 2 versions (the freestanding function and the 
> member function), move the implementation to the freestanding one, and drop 
> the static function since it is redundant.

Yeah I think I noticed that when I was doing the patch but I just left it 
because I figured it would be less disruptive. It should be fine since I'm not 
aware of any other users.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159256/new/

https://reviews.llvm.org/D159256

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159256: [NFC][Clang] Remove redundant function definitions

2023-08-31 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D159256#4630410 , @jmmartinez 
wrote:

> @jhuber6 I was wondering if there is a reason you kept 3 versions of 
> `mergeDefaultFunctionDefinitionAttributes` in 
> https://reviews.llvm.org/D152391 ?

I believe it's because one was a freestanding function, the other was a member 
function, and the last was a common implementation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159256/new/

https://reviews.llvm.org/D159256

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159118: [libc] Implement the 'clock()' function on the GPU

2023-08-30 Thread Joseph Huber via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG30307a7bb795: [libc] Implement the 'clock()' 
function on the GPU (authored by jhuber6).

Changed prior to commit:
  https://reviews.llvm.org/D159118?vs=554797&id=554831#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159118/new/

https://reviews.llvm.org/D159118

Files:
  clang/lib/Headers/llvm_libc_wrappers/time.h
  libc/config/gpu/api.td
  libc/config/gpu/entrypoints.txt
  libc/config/gpu/headers.txt
  libc/docs/gpu/support.rst
  libc/include/llvm-libc-macros/gpu/CMakeLists.txt
  libc/include/llvm-libc-macros/gpu/time-macros.h
  libc/include/llvm-libc-macros/time-macros.h
  libc/src/time/gpu/CMakeLists.txt
  libc/src/time/gpu/clock.cpp
  libc/src/time/gpu/time_utils.cpp
  libc/src/time/gpu/time_utils.h

Index: libc/src/time/gpu/time_utils.h
===
--- /dev/null
+++ libc/src/time/gpu/time_utils.h
@@ -0,0 +1,54 @@
+//===-- Generic utilities for GPU timing --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
+#define LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
+
+#include "src/__support/GPU/utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_AMDGPU)
+// AMDGPU does not have a single set frequency. Different architectures and
+// cards can have vary values. Here we default to a few known values, but for
+// complete support the frequency needs to be read from the kernel driver.
+#if defined(__gfx1010__) || defined(__gfx1011__) || defined(__gfx1012__) ||\
+defined(__gfx1013__) || defined(__gfx1030__) || defined(__gfx1031__) ||\
+defined(__gfx1032__) || defined(__gfx1033__) || defined(__gfx1034__) ||\
+defined(__gfx1035__) || defined(__gfx1036__) || defined(__gfx1100__) ||\
+defined(__gfx1101__) || defined(__gfx1102__) || defined(__gfx1103__) ||\
+defined(__gfx1150__) || defined(__gfx1151__)
+// These architectures use a 100 MHz fixed frequency clock.
+constexpr uint64_t clock_freq = 1;
+#elif defined(__gfx900__) || defined(__gfx902__) || defined(__gfx904__) || \
+defined(__gfx906__) || defined(__gfx908__) || defined(__gfx909__) ||   \
+defined(__gfx90a__) || defined(__gfx90c__) || defined(__gfx940__)
+// These architectures use a 25 MHz fixed frequency clock expect for Vega 10
+// which is actually 27 Mhz. We default to 25 MHz in all cases anyway.
+constexpr uint64_t clock_freq = 2500;
+#else
+// The frequency for these architecture is unknown. We simply default to zero.
+constexpr uint64_t clock_freq = 0;
+#endif
+
+// We provide an externally visible symbol such that the runtime can set this to
+// the correct value. If it is not set we try to default to the known values.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq;
+#define GPU_CLOCKS_PER_SEC static_cast(__llvm_libc_clock_freq)
+
+#elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
+// NPVTX uses a single 1 GHz fixed frequency clock for all target architectures.
+#define GPU_CLOCKS_PER_SEC static_cast(10UL)
+#else
+#error "Unsupported target"
+#endif
+
+} // namespace __llvm_libc
+
+#endif // LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
Index: libc/src/time/gpu/time_utils.cpp
===
--- /dev/null
+++ libc/src/time/gpu/time_utils.cpp
@@ -0,0 +1,22 @@
+//===-- Generic utilities for GPU timing --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "time_utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_AMDGPU)
+// This is expected to be initialized by the runtime if the default value is
+// insufficient.
+// TODO: Once we have another use-case for this we should put it in a common
+// device environment struct.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq = clock_freq;
+#endif
+
+} // namespace __llvm_libc
Index: libc/src/time/gpu/clock.cpp
===
--- /dev/null
+++ libc/src/time/gpu/clock.cpp
@@ -0,0 +1,29 @@
+//===-- GPU implementation of the clock function --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for li

[PATCH] D159118: [libc] Implement the 'clock()' function on the GPU

2023-08-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 554797.
jhuber6 added a comment.

Move the header portion into a common utility header.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159118/new/

https://reviews.llvm.org/D159118

Files:
  clang/lib/Headers/llvm_libc_wrappers/time.h
  libc/config/gpu/api.td
  libc/config/gpu/entrypoints.txt
  libc/config/gpu/headers.txt
  libc/docs/gpu/support.rst
  libc/include/llvm-libc-macros/gpu/CMakeLists.txt
  libc/include/llvm-libc-macros/gpu/time-macros.h
  libc/include/llvm-libc-macros/time-macros.h
  libc/src/time/gpu/CMakeLists.txt
  libc/src/time/gpu/clock.cpp
  libc/src/time/gpu/time_utils.cpp
  libc/src/time/gpu/time_utils.h

Index: libc/src/time/gpu/time_utils.h
===
--- /dev/null
+++ libc/src/time/gpu/time_utils.h
@@ -0,0 +1,54 @@
+//===-- Generic utilities for GPU timing --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
+#define LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
+
+#include "src/__support/GPU/utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+// AMDGPU does not have a single set frequency. Different architectures and
+// cards can have vary values. Here we default to a few known values, but for
+// complete support the frequency needs to be read from the kernel driver.
+#if defined(__gfx1010__) || defined(__gfx1011__) || defined(__gfx1012__) ||\
+defined(__gfx1013__) || defined(__gfx1030__) || defined(__gfx1031__) ||\
+defined(__gfx1032__) || defined(__gfx1033__) || defined(__gfx1034__) ||\
+defined(__gfx1035__) || defined(__gfx1036__) || defined(__gfx1100__) ||\
+defined(__gfx1101__) || defined(__gfx1102__) || defined(__gfx1103__) ||\
+defined(__gfx1150__) || defined(__gfx1151__)
+// These architectures use a 100 MHz fixed frequency clock.
+constexpr uint64_t clock_freq = 1;
+#elif defined(__gfx900__) || defined(__gfx902__) || defined(__gfx904__) || \
+defined(__gfx906__) || defined(__gfx908__) || defined(__gfx909__) ||   \
+defined(__gfx90a__) || defined(__gfx90c__) || defined(__gfx940__)
+// These architectures use a 25 MHz fixed frequency clock expect for Vega 10
+// which is actually 27 Mhz. We default to 25 MHz in all cases anyway.
+constexpr uint64_t clock_freq = 2500;
+#else
+// The frequency for these architecture is unknown. We simply default to zero.
+constexpr uint64_t clock_freq = 0;
+#endif
+
+// We provide an externally visible symbol such that the runtime can set this to
+// the correct value. If it is not set we try to default to the known values.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq;
+#define GPU_CLOCKS_PER_SEC static_cast(__llvm_libc_clock_freq)
+
+#elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
+// NPVTX uses a single 1 GHz fixed frequency clock for all target architectures.
+#define GPU_CLOCKS_PER_SEC static_cast(10UL)
+#else
+#error "Unsupported target"
+#endif
+
+} // namespace __llvm_libc
+
+#endif // LLVM_LIBC_SRC_TIME_GPU_TIME_UTILS_H
Index: libc/src/time/gpu/time_utils.cpp
===
--- /dev/null
+++ libc/src/time/gpu/time_utils.cpp
@@ -0,0 +1,22 @@
+//===-- Generic utilities for GPU timing --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "time_utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+// This is expected to be initialized by the runtime if the default value is
+// insufficient.
+// TODO: Once we have another use-case for this we should put it in a common
+// device environment struct.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq = clock_freq;
+#endif
+
+} // namespace __llvm_libc
Index: libc/src/time/gpu/clock.cpp
===
--- /dev/null
+++ libc/src/time/gpu/clock.cpp
@@ -0,0 +1,29 @@
+//===-- GPU implementation of the clock function --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===-

[PATCH] D158778: [CUDA] Propagate __float128 support from the host.

2023-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D158778#4625892 , @tra wrote:

> In D158778#4624408 , @ABataev wrote:
>
>> Just checks removal should be fine
>
> Looks like OpenMP handles long double and __float128 differently -- it always 
> insists on using the host's FP format for both.
> https://github.com/llvm/llvm-project/blob/d037445f3a2c6dc1842b5bfc1d5d81988c2f223d/clang/lib/AST/ASTContext.cpp#L1674
>
> This creates a divergence between what clang thinks and what LLVM can handle.
> I'm not quite sure how it's supposed to work with NVPTX or AMDGPU, where we 
> demote those types to double and can't generate code for the actual types.
>
> @jhuber6 what does OpenMP expect to happen for those types on the GPU side?

That's a good question, I'm not entirely sure what the expectation would be. We 
obviously need to keep things coherent across D2H and H2D memcpy's so we want 
them to be the same size. I'm pretty sure our handling of this is just wrong 
right now. Just doing a simple example here https://godbolt.org/z/Y3E58PKMz 
shows that for NVPTX we error out (as I would expect) but for AMDGPU we emit an 
x86 80-bit double. My guess is that we should make this more explicit, 
considering that both vendors explicitly state that quad precision is not 
available on the GPU, unless we want to implement some software floats.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158778/new/

https://reviews.llvm.org/D158778

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D159118: [libc] Implement the 'clock()' function on the GPU

2023-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 554434.
jhuber6 added a comment.

Address nits and add static check for size of `clock_t` type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159118/new/

https://reviews.llvm.org/D159118

Files:
  clang/lib/Headers/llvm_libc_wrappers/time.h
  libc/config/gpu/api.td
  libc/config/gpu/entrypoints.txt
  libc/config/gpu/headers.txt
  libc/docs/gpu/support.rst
  libc/include/llvm-libc-macros/gpu/CMakeLists.txt
  libc/include/llvm-libc-macros/gpu/time-macros.h
  libc/include/llvm-libc-macros/time-macros.h
  libc/src/time/gpu/CMakeLists.txt
  libc/src/time/gpu/clock.cpp

Index: libc/src/time/gpu/clock.cpp
===
--- /dev/null
+++ libc/src/time/gpu/clock.cpp
@@ -0,0 +1,64 @@
+//===-- GPU implementation of the clock function --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "src/time/clock.h"
+#include "src/__support/GPU/utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+// AMDGPU does not have a single set frequency. Different architectures and
+// cards can have vary values. Here we default to a few known values, but for
+// complete support the frequency needs to be read from the kernel driver.
+#if defined(__gfx1010__) || defined(__gfx1011__) || defined(__gfx1012__) ||\
+defined(__gfx1013__) || defined(__gfx1030__) || defined(__gfx1031__) ||\
+defined(__gfx1032__) || defined(__gfx1033__) || defined(__gfx1034__) ||\
+defined(__gfx1035__) || defined(__gfx1036__) || defined(__gfx1100__) ||\
+defined(__gfx1101__) || defined(__gfx1102__) || defined(__gfx1103__) ||\
+defined(__gfx1150__) || defined(__gfx1151__)
+// These architectures use a 100 MHz fixed frequency clock.
+constexpr uint64_t clock_freq = 1;
+#elif defined(__gfx900__) || defined(__gfx902__) || defined(__gfx904__) || \
+defined(__gfx906__) || defined(__gfx908__) || defined(__gfx909__) ||   \
+defined(__gfx90a__) || defined(__gfx90c__) || defined(__gfx940__)
+// These architectures use a 25 MHz fixed frequency clock expect for Vega 10
+// which is actually 27 Mhz. We default to 25 MHz in all cases anyway.
+constexpr uint64_t clock_freq = 2500;
+#else
+// The frequency for these architecture is unknown. We simply default to zero.
+constexpr uint64_t clock_freq = 0;
+#endif
+
+// We provide an externally visible symbol such that the runtime can set this to
+// the correct value. If it is not set we try to default to the known values.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq = clock_freq;
+#define GPU_CLOCKS_PER_SEC static_cast(__llvm_libc_clock_freq)
+
+#elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
+// NPVTX uses a single 1 GHz fixed frequency clock for all target architectures.
+#define GPU_CLOCKS_PER_SEC static_cast(10UL)
+#else
+#error "Unsupported target"
+#endif
+
+LLVM_LIBC_FUNCTION(clock_t, clock, ()) {
+  if (!GPU_CLOCKS_PER_SEC)
+return clock_t(0);
+
+  uint64_t ticks = gpu::fixed_frequency_clock();
+
+  // We need to convert between the GPU's fixed frequency and whatever `time.h`
+  // declares it to be. This is done so that dividing the result of this
+  // function by 'CLOCKS_PER_SEC' yields the elapsed time.
+  if (GPU_CLOCKS_PER_SEC > CLOCKS_PER_SEC)
+return clock_t(ticks / (GPU_CLOCKS_PER_SEC / CLOCKS_PER_SEC));
+  return clock_t(ticks * (CLOCKS_PER_SEC / GPU_CLOCKS_PER_SEC));
+}
+
+} // namespace __llvm_libc
Index: libc/src/time/gpu/CMakeLists.txt
===
--- /dev/null
+++ libc/src/time/gpu/CMakeLists.txt
@@ -0,0 +1,10 @@
+add_entrypoint_object(
+  clock
+  SRCS
+clock.cpp
+  HDRS
+../clock.h
+  DEPENDS
+libc.include.time
+libc.src.__support.GPU.utils
+)
Index: libc/include/llvm-libc-macros/time-macros.h
===
--- libc/include/llvm-libc-macros/time-macros.h
+++ libc/include/llvm-libc-macros/time-macros.h
@@ -3,6 +3,8 @@
 
 #ifdef __linux__
 #include "linux/time-macros.h"
+#elif defined(__AMDGPU__) || defined(__NVPTX__)
+#include "gpu/time-macros.h"
 #endif
 
 #endif // __LLVM_LIBC_MACROS_TIME_MACROS_H
Index: libc/include/llvm-libc-macros/gpu/time-macros.h
===
--- /dev/null
+++ libc/include/llvm-libc-macros/gpu/time-macros.h
@@ -0,0 +1,14 @@
+//===-- Definition of macros from time.h -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt fo

[PATCH] D159118: [libc] Implement the 'clock()' function on the GPU

2023-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: arsenm, tra, JonChesterfield, jdoerfert, sivachandra, 
lntue, michaelrj.
Herald added subscribers: libc-commits, tpr.
Herald added projects: libc-project, All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

This patch implements the `clock()` function on the GPU. This function
is supposed to return a timestamp that can be converted into seconds
using the `CLOCKS_PER_SEC` macro. The GPU has a fixed frequency timer
that can be used for this purpose. However, there are some
considerations.

First is that AMDGPU does not have a statically known fixed frequency. I
know internally that the gfx10xx and gfx11xx series use a 100 MHz clock
which will probably remain for the future. Gfx9xx typically uses a 25
MHz clock except for the Vega 10 GPU. The only way to know for sure is
to look it up from the runtime. For this purpose, I elected to default
it to some known values and assign these to an exteranlly visible symbol
that can be initialized if needed. If we do not have a good guess we
just return zero.

Second is that the `CLOCKS_PER_SEC` macro only gives about a microsecond
of resolution. POSIX demands that it's 1,000,000 so it's best that we
keep with this tradition as almost all targets seem to respect this. The
reason this is important is because on the GPU we will almost assuredly
be copying the host's macro value (see the wrapper header) so we should
go with the POSIX version that's most likely to be set. (We could
probably make a warning if the included header doesn't match the
expected value).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D159118

Files:
  clang/lib/Headers/llvm_libc_wrappers/time.h
  libc/config/gpu/api.td
  libc/config/gpu/entrypoints.txt
  libc/config/gpu/headers.txt
  libc/docs/gpu/support.rst
  libc/include/llvm-libc-macros/gpu/CMakeLists.txt
  libc/include/llvm-libc-macros/gpu/time-macros.h
  libc/include/llvm-libc-macros/time-macros.h
  libc/src/time/gpu/CMakeLists.txt
  libc/src/time/gpu/clock.cpp

Index: libc/src/time/gpu/clock.cpp
===
--- /dev/null
+++ libc/src/time/gpu/clock.cpp
@@ -0,0 +1,65 @@
+//===-- GPU implementation of the clock function --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "src/time/clock.h"
+#include "src/__support/GPU/utils.h"
+
+namespace __llvm_libc {
+
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+// AMDGPU does not have a single set frequency. Different architectures and
+// cards can have vary values. Here we default to a few known values, but for
+// complete support the frequency needs to be read from the kernel driver.
+#if defined(__gfx1010__) || defined(__gfx1011__) || defined(__gfx1012__) ||\
+defined(__gfx1013__) || defined(__gfx1030__) || defined(__gfx1031__) ||\
+defined(__gfx1032__) || defined(__gfx1033__) || defined(__gfx1034__) ||\
+defined(__gfx1035__) || defined(__gfx1036__) || defined(__gfx1100__) ||\
+defined(__gfx1101__) || defined(__gfx1102__) || defined(__gfx1103__) ||\
+defined(__gfx1150__) || defined(__gfx1151__)
+// These architectures use a 100 MHz fixed frequency clock.
+constexpr uint64_t clock_freq = 1;
+#elif defined(__gfx900__) || defined(__gfx902__) || defined(__gfx904__) || \
+defined(__gfx906__) || defined(__gfx908__) || defined(__gfx909__) ||   \
+defined(__gfx90a__) || defined(__gfx90c__) || defined(__gfx940__)
+// These architectures use a 25 MHz fixed frequency clock expect for Vega 10
+// which is actually 27 Mhz. We default to 25 MHz in all cases anyway.
+constexpr uint64_t clock_freq = 2500;
+#else
+// The frequency for these architecture is unknown. We simply default to zero.
+constexpr uint64_t clock_freq = 0;
+#endif
+
+// We provide an externally visible symbol such that the runtime can set this to
+// the correct value. If it is not set we try to default to the known values.
+extern "C" [[gnu::visibility("protected")]] uint64_t
+[[clang::address_space(4)]] __llvm_libc_clock_freq = clock_freq;
+#define GPU_CLOCKS_PER_SEC static_cast(__llvm_libc_clock_freq)
+
+#elif defined(LIBC_TARGET_ARCH_IS_NVPTX)
+// NPVTX uses a single 1 GHz fixed frequency clock for all target architectures.
+#define GPU_CLOCKS_PER_SEC static_cast(10UL)
+#else
+#error "Unsupported target"
+#endif
+
+LLVM_LIBC_FUNCTION(clock_t, clock, ()) {
+  if (!GPU_CLOCKS_PER_SEC)
+return clock_t(0);
+
+  uint64_t ticks = gpu::fixed_frequency_clock();
+
+  // We need to convert between the GPU's fixed frequency and whatever `time.h`
+  // declares it to be. This is done so that dividin

[PATCH] D112921: [clang] Enable sized deallocation by default in C++14 onwards

2023-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a subscriber: sivachandra.
jhuber6 added a comment.

In D112921#4624580 , @wangpc wrote:

> It seems that the linker can't find sized deallocation (no support in the 
> environment or AMDGPU libraries?).

We should have some implementations here I thought 
https://github.com/llvm/llvm-project/blob/main/libc/src/__support/CPP/new.cpp, 
maybe @sivachandra can elucidate on that. I actually don't know what the 
expected behavior is here, since we don't really have any of this support on 
GPU targets. I'm wondering why we're currently fine with compiling `delete` but 
not with the sized version, since their definitions should both be present in 
that file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112921/new/

https://reviews.llvm.org/D112921

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112921: [clang] Enable sized deallocation by default in C++14 onwards

2023-08-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

This caused some linking errors with the GPU libc test suite, see 
https://lab.llvm.org/staging/#/builders/247/builds/5659.

  clang++: error: ld.lld command failed with exit code 1 (use -v to see 
invocation)
  [331/473] Linking CXX executable 
libc/test/src/__support/libc.test.src.__support.uint_test.__hermetic__.__build__
  FAILED: 
libc/test/src/__support/libc.test.src.__support.uint_test.__hermetic__.__build__
 
  : && 
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++
 --target=x86_64-unknown-linux-gnu -fPIC -fno-semantic-interposition 
-fvisibility-inlines-hidden -Werror=date-time 
-Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter 
-Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough 
-Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor 
-Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion 
-Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color 
-ffunction-sections -fdata-sections -O3 -DNDEBUG -Wl,--color-diagnostics
-nostdlib -static 
libc/startup/gpu/amdgpu/CMakeFiles/libc.startup.gpu.amdgpu.crt1.dir/start.cpp.o 
libc/test/src/__support/CMakeFiles/libc.test.src.__support.uint_test.__hermetic__.__build__.dir/uint_test.cpp.o
 -o 
libc/test/src/__support/libc.test.src.__support.uint_test.__hermetic__.__build__
  libc/test/UnitTest/libLibcTest.hermetic.a  
libc/test/UnitTest/libLibcHermeticTestSupport.hermetic.a  
libc/test/src/__support/liblibc.test.src.__support.uint_test.__hermetic__.libc.a
  -mcpu=gfx906  --target=amdgcn-amd-amdhsa  -flto  
-Wl,-mllvm,-amdgpu-lower-global-ctor-dtor=0 && :
  ld.lld: error: undefined symbol: operator delete(void*, unsigned long)
  >>> referenced by 
lto.tmp:(LlvmLibcUIntClassTest_ConstructorFromUInt128Tests::~LlvmLibcUIntClassTest_ConstructorFromUInt128Tests())
  >>> referenced by 
lto.tmp:(LlvmLibcUIntClassTest_ConstructorFromUInt128Tests::~LlvmLibcUIntClassTest_ConstructorFromUInt128Tests())
  >>> referenced by 
lto.tmp:(LlvmLibcUIntClassTest_BasicArithmeticInt128Tests::~LlvmLibcUIntClassTest_BasicArithmeticInt128Tests())
  >>> referenced 41 more times
  >>> did you mean: operator delete(void*)
  >>> defined in: lto.tmp
  clang++: error: ld.lld command failed with exit code 1 (use -v to see 
invocation)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112921/new/

https://reviews.llvm.org/D112921

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153924: [OpenMP] Allow exceptions in target regions when offloading to GPUs

2023-08-28 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

The libcxx tests are always broken randomly in my experience. I wouldn't worry 
about it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153924/new/

https://reviews.llvm.org/D153924

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-28 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

I think it's fine now given that it's passing tests. Others feel free to 
comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-28 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Just a few more nits. I think it's looking fine but I haven't tested it. Anyone 
else?




Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:406
 
+  // pass on -mllvm options to the clang
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {





Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:415-417
+  if (SaveTemps) {
 CmdArgs.push_back("-save-temps");
+  }

No braces around a single line if.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:54
+
+uint16_t getImplicitArgsSize(uint16_t Version) {
+  return Version < ELF::ELFABIVERSION_AMDGPU_HSA_V5

We return uint16_t here? These are sizes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-24 Thread Joseph Huber via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9da61aed751e: [OpenMP] Emit offloading entries for indirect 
target variables (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/target_indirect_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
  llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Index: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
===
--- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -5705,9 +5705,10 @@
 
 void OpenMPIRBuilder::createOffloadEntry(Constant *ID, Constant *Addr,
  uint64_t Size, int32_t Flags,
- GlobalValue::LinkageTypes) {
+ GlobalValue::LinkageTypes,
+ StringRef Name) {
   if (!Config.isGPU()) {
-emitOffloadingEntry(ID, Addr->getName(), Size, Flags);
+emitOffloadingEntry(ID, Name.empty() ? Addr->getName() : Name, Size, Flags);
 return;
   }
   // TODO: Add support for global variables on the device after declare target
@@ -5867,13 +5868,20 @@
 
   // Hidden or internal symbols on the device are not externally visible.
   // We should not attempt to register them by creating an offloading
-  // entry.
+  // entry. Indirect variables are handled separately on the device.
   if (auto *GV = dyn_cast(CE->getAddress()))
-if (GV->hasLocalLinkage() || GV->hasHiddenVisibility())
+if ((GV->hasLocalLinkage() || GV->hasHiddenVisibility()) &&
+Flags != OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
   continue;
 
-  createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
- Flags, CE->getLinkage());
+  // Indirect globals need to use a special name that doesn't match the name
+  // of the associated host global.
+  if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage(), CE->getVarName());
+  else
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage());
 
 } else {
   llvm_unreachable("Unsupported entry kind.");
@@ -6218,8 +6226,13 @@
   }
   return;
 }
-OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
-  Addr, VarSize, Flags, Linkage);
+if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+  OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
+Addr, VarSize, Flags, Linkage,
+VarName.str());
+else
+  OffloadEntriesDeviceGlobalVar.try_emplace(
+  VarName, OffloadingEntriesNum, Addr, VarSize, Flags, Linkage, "");
 ++OffloadingEntriesNum;
   }
 }
Index: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
===
--- llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -326,6 +326,8 @@
 OMPTargetGlobalVarEntryEnter = 0x2,
 /// Mark the entry as having no declare target entry kind.
 OMPTargetGlobalVarEntryNone = 0x3,
+/// Mark the entry as a declare target indirect global.
+OMPTargetGlobalVarEntryIndirect = 0x4,
   };
 
   /// Kind of device clause for declare target variables
@@ -349,6 +351,7 @@
 /// Type of the global variable.
 int64_t VarSize;
 GlobalValue::LinkageTypes Linkage;
+const std::string VarName;
 
   public:
 OffloadEntryInfoDeviceGlobalVar()
@@ -359,13 +362,15 @@
 explicit OffloadEntryInfoDeviceGlobalVar(unsigned Order, Constant *Addr,
  int64_t VarSize,
  OMPTargetGlobalVarEntryKind Flags,
- GlobalValue::LinkageTypes Linkage)
+ GlobalValue::LinkageTypes Linkage,
+ const std::string &VarName)
 : OffloadEntryInfo(OffloadingEntryInfoDeviceGlobalVar, Order, Flags),
-  VarSize(VarSize), Linkage(Linkage) {
+  VarSize(VarSize), Linkage(Linkage), VarName(VarName) {
   setAddress(Addr);
 }
 
 int64_t getVarSize() const { return VarSize; }
+StringRef getVarName() const { return VarName; }
 void setVar

[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-24 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:49
 
-static_assert(sizeof(AMDGPUImplicitArgsTy) == 56,
-  "Unexpected size of implicit arguments");
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

saiislam wrote:
> jhuber6 wrote:
> > We should probably be using `sizeof` now that it's back to being a struct 
> > and keep the old struct definition.
> AMDGPU plugin doesn't use any implicitarg for COV4, but it does so for COV5. 
> So, we are not keeping two separate structures for implicitargs of COV4 and 
> COV5.
> If we use sizeof then it will always return 256 corresponding to COV5 (even 
> for cov4, which should be 56). That's why we need this function.
Yeah, I guess for COV4 the only thing that mattered was the size so that we 
could make sure it's all set to zero. We shouldn't use the enum value. It 
should be `sizeof(ImplicitArgsTy)` for `COV5` and either hard-code it in the 
function for V4 or make a dummy struct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-24 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:49
 
-static_assert(sizeof(AMDGPUImplicitArgsTy) == 56,
-  "Unexpected size of implicit arguments");
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

We should probably be using `sizeof` now that it's back to being a struct and 
keep the old struct definition.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158582: [AMDGPU] Respect unresolved symbol option if forwarded to linker

2023-08-23 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D158582#4610023 , @yaxunl wrote:

> The `-Wl` and `-Xlinker` options are intended for the host linker and we 
> intentionally do not pass them to the device linker.
>
> If users want to pass options to the device linker, they need to use 
> -Xoffload-linker.
>
> There are multiple options affecting the handling of unresolved symbols. I 
> think the proper way is to use `--no-undefined` as the default, which is 
> always passed before those options from `-Xoffload-linker` so that the latter 
> can override the former.
>
> I believe the driver already passes `-Xoffload-linker`  options to 
> amdgpu::Linker::ConstructJob by Args. I think we probably only need to move 
> all the default options (line 563-566) to 554 so that they can be overridden.

Yeah, the original problem was not being able to overload the defaults. This 
fundamentally is just because the `-Wl` options are being added in 
`AddLinkerInputs` before the defaults. Surprised I didn't catch that, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158582/new/

https://reviews.llvm.org/D158582

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158582: [AMDGPU] Respect unresolved symbol option if forwarded to linker

2023-08-23 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Remember to clang format.




Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:564-575
+  // If the user has manually passed -Wl,--unresolved-symbols=* as a linker
+  // option, we should not add --no-undefined
+  bool UnresolvedOpt = false;
+  for (auto A : Args)
+if (A->getOption().matches(options::OPT_Wl_COMMA) ||
+ A->getOption().matches(options::OPT_Xlinker))
+  for (StringRef V : A->getValues())

A little more concisely.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158582/new/

https://reviews.llvm.org/D158582

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153924: [OpenMP] Allow exceptions in target regions when offloading to GPUs

2023-08-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/test/OpenMP/amdgpu_exceptions.cpp:11
+// RUN: %clang_cc1 -fopenmp -triple amdgcn-amd-amdhsa 
-fopenmp-is-target-device -fexceptions %s -emit-llvm -S -verify=with 
-Wopenmp-target-exception -o - &> /dev/null
+// RUN: %clang_cc1 -fopenmp -triple amdgcn-amd-amdhsa 
-fopenmp-is-target-device %s -emit-llvm -S -verify=with 
-Wopenmp-target-exception -o - &> /dev/null
+

jdoerfert wrote:
> Can we use /dev/null? Do other tests use it? I would expect -analyze or sth 
> instead.
There's other tests, but I think that requires the shell or linux.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153924/new/

https://reviews.llvm.org/D153924

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158298: [OpenMP] Always pass the optimization level to the linker wrapper

2023-08-18 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4ab4e40fa294: [OpenMP] Always pass the optimization level to 
the linker wrapper (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158298/new/

https://reviews.llvm.org/D158298

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/amdgpu-openmp-toolchain.c


Index: clang/test/Driver/amdgpu-openmp-toolchain.c
===
--- clang/test/Driver/amdgpu-openmp-toolchain.c
+++ clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -70,3 +70,7 @@
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
 // CHECK-TARGET-ID-ERROR: error: invalid offload arch combinations: 'gfx90a' 
and 'gfx90a:xnack+'
+
+// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a 
\
+// RUN:   -O3 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-OPT
+// CHECK-OPT: clang-linker-wrapper{{.*}}"--opt-level=O3"
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -8619,24 +8619,22 @@
 }
   }
 
-  if (D.isUsingLTO(/* IsOffload */ true)) {
-// Pass in the optimization level to use for LTO.
-if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
-  StringRef OOpt;
-  if (A->getOption().matches(options::OPT_O4) ||
-  A->getOption().matches(options::OPT_Ofast))
-OOpt = "3";
-  else if (A->getOption().matches(options::OPT_O)) {
-OOpt = A->getValue();
-if (OOpt == "g")
-  OOpt = "1";
-else if (OOpt == "s" || OOpt == "z")
-  OOpt = "2";
-  } else if (A->getOption().matches(options::OPT_O0))
-OOpt = "0";
-  if (!OOpt.empty())
-CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
-}
+  // Pass in the optimization level to use for LTO.
+  if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
+StringRef OOpt;
+if (A->getOption().matches(options::OPT_O4) ||
+A->getOption().matches(options::OPT_Ofast))
+  OOpt = "3";
+else if (A->getOption().matches(options::OPT_O)) {
+  OOpt = A->getValue();
+  if (OOpt == "g")
+OOpt = "1";
+  else if (OOpt == "s" || OOpt == "z")
+OOpt = "2";
+} else if (A->getOption().matches(options::OPT_O0))
+  OOpt = "0";
+if (!OOpt.empty())
+  CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
   }
 
   CmdArgs.push_back(


Index: clang/test/Driver/amdgpu-openmp-toolchain.c
===
--- clang/test/Driver/amdgpu-openmp-toolchain.c
+++ clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -70,3 +70,7 @@
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
 // CHECK-TARGET-ID-ERROR: error: invalid offload arch combinations: 'gfx90a' and 'gfx90a:xnack+'
+
+// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a \
+// RUN:   -O3 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-OPT
+// CHECK-OPT: clang-linker-wrapper{{.*}}"--opt-level=O3"
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -8619,24 +8619,22 @@
 }
   }
 
-  if (D.isUsingLTO(/* IsOffload */ true)) {
-// Pass in the optimization level to use for LTO.
-if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
-  StringRef OOpt;
-  if (A->getOption().matches(options::OPT_O4) ||
-  A->getOption().matches(options::OPT_Ofast))
-OOpt = "3";
-  else if (A->getOption().matches(options::OPT_O)) {
-OOpt = A->getValue();
-if (OOpt == "g")
-  OOpt = "1";
-else if (OOpt == "s" || OOpt == "z")
-  OOpt = "2";
-  } else if (A->getOption().matches(options::OPT_O0))
-OOpt = "0";
-  if (!OOpt.empty())
-CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
-}
+  // Pass in the optimization level to use for LTO.
+  if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
+StringRef OOpt;
+if (A->getOption().matches(options::OPT_O4) ||
+A->getOption().matches(options::OPT_Ofast))
+  OOpt = "3";
+else if (A->getOption().matches(options::OPT_O)) {
+  OOpt = A->getValue();
+  if (OOpt == "g")
+OOpt = "1";
+  else if (OOpt == "s" || OOpt == "z")
+OOpt = "2";
+} else if (A->getOption(

[PATCH] D158298: [OpenMP] Always pass the optimization level to the linker wrapper

2023-08-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, JonChesterfield, ye-luo.
Herald added subscribers: kerbowa, guansong, tpr, yaxunl, jvesely.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1, MaskRay.
Herald added a project: clang.

The linker wrapper runs LTO internally, so it needs to know the
optimization level the user requested, if any. Previously this was only
done in `-foffload-lto` mode as we were assuming that this would enble
LTO. However, AMDGPU always performs LTO, and it's possible to run clang
on object files to link without passing this flag. So we should just
respect it always.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158298

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/amdgpu-openmp-toolchain.c


Index: clang/test/Driver/amdgpu-openmp-toolchain.c
===
--- clang/test/Driver/amdgpu-openmp-toolchain.c
+++ clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -70,3 +70,7 @@
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
 // CHECK-TARGET-ID-ERROR: error: invalid offload arch combinations: 'gfx90a' 
and 'gfx90a:xnack+'
+
+// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a 
\
+// RUN:   -O3 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-OPT
+// CHECK-OPT: clang-linker-wrapper{{.*}}"--opt-level=O3"
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -8617,24 +8617,22 @@
 }
   }
 
-  if (D.isUsingLTO(/* IsOffload */ true)) {
-// Pass in the optimization level to use for LTO.
-if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
-  StringRef OOpt;
-  if (A->getOption().matches(options::OPT_O4) ||
-  A->getOption().matches(options::OPT_Ofast))
-OOpt = "3";
-  else if (A->getOption().matches(options::OPT_O)) {
-OOpt = A->getValue();
-if (OOpt == "g")
-  OOpt = "1";
-else if (OOpt == "s" || OOpt == "z")
-  OOpt = "2";
-  } else if (A->getOption().matches(options::OPT_O0))
-OOpt = "0";
-  if (!OOpt.empty())
-CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
-}
+  // Pass in the optimization level to use for LTO.
+  if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
+StringRef OOpt;
+if (A->getOption().matches(options::OPT_O4) ||
+A->getOption().matches(options::OPT_Ofast))
+  OOpt = "3";
+else if (A->getOption().matches(options::OPT_O)) {
+  OOpt = A->getValue();
+  if (OOpt == "g")
+OOpt = "1";
+  else if (OOpt == "s" || OOpt == "z")
+OOpt = "2";
+} else if (A->getOption().matches(options::OPT_O0))
+  OOpt = "0";
+if (!OOpt.empty())
+  CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
   }
 
   CmdArgs.push_back(


Index: clang/test/Driver/amdgpu-openmp-toolchain.c
===
--- clang/test/Driver/amdgpu-openmp-toolchain.c
+++ clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -70,3 +70,7 @@
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
 // CHECK-TARGET-ID-ERROR: error: invalid offload arch combinations: 'gfx90a' and 'gfx90a:xnack+'
+
+// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a \
+// RUN:   -O3 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-OPT
+// CHECK-OPT: clang-linker-wrapper{{.*}}"--opt-level=O3"
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -8617,24 +8617,22 @@
 }
   }
 
-  if (D.isUsingLTO(/* IsOffload */ true)) {
-// Pass in the optimization level to use for LTO.
-if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
-  StringRef OOpt;
-  if (A->getOption().matches(options::OPT_O4) ||
-  A->getOption().matches(options::OPT_Ofast))
-OOpt = "3";
-  else if (A->getOption().matches(options::OPT_O)) {
-OOpt = A->getValue();
-if (OOpt == "g")
-  OOpt = "1";
-else if (OOpt == "s" || OOpt == "z")
-  OOpt = "2";
-  } else if (A->getOption().matches(options::OPT_O0))
-OOpt = "0";
-  if (!OOpt.empty())
-CmdArgs.push_back(Args.MakeArgString(Twine("--opt-level=O") + OOpt));
-}
+  // Pass in the optimization level to use for LTO.
+  if (const Arg *A = Args.getLastArg(

[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-17 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Some nits. I'm assuming we're getting the code object in the backend now? We'll 
need to make sure that `-Wl,--amdhsa-code-object-version` is passed to the 
clang invocation inside of the `clang-linker-wrapper` to handle `-save-temps` 
mode.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17053
+/// Note: "llvm.amdgcn.abi.version" is supposed to be emitted and intialized by
+///   the clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {





Comment at: clang/lib/Driver/ToolChain.cpp:1363
   for (auto *A : Args) {
+
 // Exclude flags which may only apply to the host toolchain.

Random whitespace.



Comment at: clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu:54
+#endif
\ No newline at end of file


Need newline



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:3016
+  if (getImplicitArgsSize() < utils::COV5_SIZE) {
+DP("Setting fields of ImplicitArgs for COV4\n");
+  } else {

Don't think this needs to be a debug message, same below



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

I'm still not a fan of replacing the struct. The mnemonic of having a struct is 
much more user friendly.
```
ImplicitArgsTy Args{};
std::memset(&Args, sizeof(ImplicitArgsTy), 0);
...
```
If we don't use something, just make it some random bytes, e.g.
```
struct ImplicitArgsTy {
  uint64_t OffsetX;
  uint8_t Unused[64]; // 64 byte offset.
};
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D158131: HIP: Directly use f32 sqrt intrinsic

2023-08-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

I'm hoping to write some brute force tests for the `fp32` math functions in 
`libc` soonish.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158131/new/

https://reviews.llvm.org/D158131

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 550140.
jhuber6 added a comment.

Address comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/target_indirect_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
  llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Index: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
===
--- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -5525,9 +5525,10 @@
 
 void OpenMPIRBuilder::createOffloadEntry(Constant *ID, Constant *Addr,
  uint64_t Size, int32_t Flags,
- GlobalValue::LinkageTypes) {
+ GlobalValue::LinkageTypes,
+ StringRef Name) {
   if (!Config.isGPU()) {
-emitOffloadingEntry(ID, Addr->getName(), Size, Flags);
+emitOffloadingEntry(ID, Name.empty() ? Addr->getName() : Name, Size, Flags);
 return;
   }
   // TODO: Add support for global variables on the device after declare target
@@ -5687,13 +5688,20 @@
 
   // Hidden or internal symbols on the device are not externally visible.
   // We should not attempt to register them by creating an offloading
-  // entry.
+  // entry. Indirect variables are handled separately on the device.
   if (auto *GV = dyn_cast(CE->getAddress()))
-if (GV->hasLocalLinkage() || GV->hasHiddenVisibility())
+if ((GV->hasLocalLinkage() || GV->hasHiddenVisibility()) &&
+Flags != OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
   continue;
 
-  createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
- Flags, CE->getLinkage());
+  // Indirect globals need to use a special name that doesn't match the name
+  // of the associated host global.
+  if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage(), CE->getVarName());
+  else
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage());
 
 } else {
   llvm_unreachable("Unsupported entry kind.");
@@ -6038,8 +6046,13 @@
   }
   return;
 }
-OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
-  Addr, VarSize, Flags, Linkage);
+if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+  OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
+Addr, VarSize, Flags, Linkage,
+VarName.str());
+else
+  OffloadEntriesDeviceGlobalVar.try_emplace(
+  VarName, OffloadingEntriesNum, Addr, VarSize, Flags, Linkage, "");
 ++OffloadingEntriesNum;
   }
 }
Index: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
===
--- llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -326,6 +326,8 @@
 OMPTargetGlobalVarEntryEnter = 0x2,
 /// Mark the entry as having no declare target entry kind.
 OMPTargetGlobalVarEntryNone = 0x3,
+/// Mark the entry as a declare target indirect global.
+OMPTargetGlobalVarEntryIndirect = 0x4,
   };
 
   /// Kind of device clause for declare target variables
@@ -349,6 +351,7 @@
 /// Type of the global variable.
 int64_t VarSize;
 GlobalValue::LinkageTypes Linkage;
+const std::string VarName;
 
   public:
 OffloadEntryInfoDeviceGlobalVar()
@@ -359,13 +362,15 @@
 explicit OffloadEntryInfoDeviceGlobalVar(unsigned Order, Constant *Addr,
  int64_t VarSize,
  OMPTargetGlobalVarEntryKind Flags,
- GlobalValue::LinkageTypes Linkage)
+ GlobalValue::LinkageTypes Linkage,
+ const std::string &VarName)
 : OffloadEntryInfo(OffloadingEntryInfoDeviceGlobalVar, Order, Flags),
-  VarSize(VarSize), Linkage(Linkage) {
+  VarSize(VarSize), Linkage(Linkage), VarName(VarName) {
   setAddress(Addr);
 }
 
 int64_t getVarSize() const { return VarSize; }
+StringRef getVarName() const { return VarName; }
 void setVarSize(int64_t Size) { VarSize = Size; }
 GlobalValue::LinkageTypes getLinkage() const { return Linkag

[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 marked an inline comment as done.
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:1996-1997
+llvm::GlobalValue *GV) {
+  std::optional ActiveAttr =
+  OMPDeclareTargetDeclAttr::getActiveAttr(FD);
+

arsenm wrote:
> not a huge fan of std::optional
This is pretty far entrenched in the Clang handling for this attribute so I 
don't intend to change it here.



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:2022
+  OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
+  Name, Addr, CGM.getContext().getTypeSize(CGM.getContext().VoidPtrTy) / 8,
+  llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect,

arsenm wrote:
> isn't there a store size?
Yeah I can use that instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-14 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D157738#4586465 , @JonChesterfield 
wrote:

>> calling device functions via their associated host pointer
>
> What does this mean? Defining a function foo such that the host and each 
> individual target each have their own machine code for it, such that &foo on 
> the host can be copied over to the target and then invoked to mean call the 
> function on the local target with the same name?
>
> If so, calling through the pointer &foo on the GPU doing a logarithmic search 
> through a table to choose a function address to branch to sounds like 
> something that will codegen into very slow code. Does it do that search on 
> every call?
>
> Is there an ambition to have &foo on the host and &foo on the target return 
> the same value, in the pointer equality sense?

That's exactly what it means, the mapping is only done for targets with 
`indirect` declared on them. The indirect calls themselves, I think @jdoerfert 
is implementing some specialization? He just asked me to implement this since 
it's related to copying of virtual classes to the device.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157738/new/

https://reviews.llvm.org/D157738

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157738: [OpenMP] Emit offloading entries for indirect target variables

2023-08-11 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, ye-luo, ABataev, 
RaviNarayanaswamy.
Herald added subscribers: guansong, hiraditya, yaxunl.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jplehr, sstefan1.
Herald added projects: clang, LLVM.

OpenMP 5.1 allows emission of the `indirect` clause on declare target
functions, see 
https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7.
The intended use of this is to permit calling device functions via their
associated host pointer. In order to do this the first step will be
building a map associating these variables. Doing this will require the
same offloading entry handling we use for other kernels and globals.

We intentionally emit a new global on the device side. Although it's
possible to look up the device function's address directly, this would
require changing the visibility and would prevent us from making static
functions indirect. Also, the CUDA toolchain will optimize out unused
functions and using a global prevents that. The downside is that the
runtime will need to read the global and copy its value, but there
shouldn't be any other costs.

Note that this patch just performs the codegen, currently this new
offloading entry type is unused and will be ignored by the runtime.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D157738

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/target_indirect_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
  llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Index: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
===
--- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -5525,9 +5525,10 @@
 
 void OpenMPIRBuilder::createOffloadEntry(Constant *ID, Constant *Addr,
  uint64_t Size, int32_t Flags,
- GlobalValue::LinkageTypes) {
+ GlobalValue::LinkageTypes,
+ StringRef Name) {
   if (!Config.isGPU()) {
-emitOffloadingEntry(ID, Addr->getName(), Size, Flags);
+emitOffloadingEntry(ID, Name.empty() ? Addr->getName() : Name, Size, Flags);
 return;
   }
   // TODO: Add support for global variables on the device after declare target
@@ -5687,13 +5688,20 @@
 
   // Hidden or internal symbols on the device are not externally visible.
   // We should not attempt to register them by creating an offloading
-  // entry.
+  // entry. Indirect variables are handled separately on the device.
   if (auto *GV = dyn_cast(CE->getAddress()))
-if (GV->hasLocalLinkage() || GV->hasHiddenVisibility())
+if ((GV->hasLocalLinkage() || GV->hasHiddenVisibility()) &&
+Flags != OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
   continue;
 
-  createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
- Flags, CE->getLinkage());
+  // Indirect globals need to use a special name that doesn't match the name
+  // of the associated host global.
+  if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage(), CE->getVarName());
+  else
+createOffloadEntry(CE->getAddress(), CE->getAddress(), CE->getVarSize(),
+   Flags, CE->getLinkage());
 
 } else {
   llvm_unreachable("Unsupported entry kind.");
@@ -6038,8 +6046,13 @@
   }
   return;
 }
-OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
-  Addr, VarSize, Flags, Linkage);
+if (Flags == OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect)
+  OffloadEntriesDeviceGlobalVar.try_emplace(VarName, OffloadingEntriesNum,
+Addr, VarSize, Flags, Linkage,
+VarName.str());
+else
+  OffloadEntriesDeviceGlobalVar.try_emplace(
+  VarName, OffloadingEntriesNum, Addr, VarSize, Flags, Linkage, "");
 ++OffloadingEntriesNum;
   }
 }
Index: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
===
--- llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -326,6 +326,8 @@
 OMPTargetGlobalVarEntryEnter = 0x2,
 /// Mark the entry as having no declare target entry kind.
 OMPTargetGlobalVarEntryNone = 0x3,
+/// Mark the entry as a declare target indirect global.
+OMPTargetGlobalVarE

[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-09 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156816#4574190 , @arsenm wrote:

> Probably should just wrap uses in macros for now

In clang? Or just have users deal with `opencl_` on everything and rename it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157438: [OpenMP] Ensure wrapper headers are included on both host and device

2023-08-08 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG61709bbae37a: [OpenMP] Ensure wrapper headers are included 
on both host and device (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157438/new/

https://reviews.llvm.org/D157438

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/gpu-libc-headers.c


Index: clang/test/Driver/gpu-libc-headers.c
===
--- clang/test/Driver/gpu-libc-headers.c
+++ clang/test/Driver/gpu-libc-headers.c
@@ -8,6 +8,7 @@
 // RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70  \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
+// CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RUN: -nogpuinc %s 2>&1 | FileCheck %s 
--check-prefix=CHECK-HEADERS-DISABLED
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -1183,14 +1183,13 @@
   // with ones created by the 'libc' project if present.
   if (!Args.hasArg(options::OPT_nostdinc) &&
   !Args.hasArg(options::OPT_nogpuinc) &&
-  !Args.hasArg(options::OPT_nobuiltininc) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
-
+  !Args.hasArg(options::OPT_nobuiltininc)) {
 // Without an offloading language we will include these headers directly.
 // Offloading languages will instead only use the declarations stored in
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {
   SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
   llvm::sys::path::append(P, "include");
   llvm::sys::path::append(P, "gpu-none-llvm");


Index: clang/test/Driver/gpu-libc-headers.c
===
--- clang/test/Driver/gpu-libc-headers.c
+++ clang/test/Driver/gpu-libc-headers.c
@@ -8,6 +8,7 @@
 // RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70  \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" "{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
+// CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" "{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RUN: -nogpuinc %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS-DISABLED
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -1183,14 +1183,13 @@
   // with ones created by the 'libc' project if present.
   if (!Args.hasArg(options::OPT_nostdinc) &&
   !Args.hasArg(options::OPT_nogpuinc) &&
-  !Args.hasArg(options::OPT_nobuiltininc) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
-
+  !Args.hasArg(options::OPT_nobuiltininc)) {
 // Without an offloading language we will include these headers directly.
 // Offloading languages will instead only use the declarations stored in
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {
   SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
   llvm::sys::path::append(P, "include");
   llvm::sys::path::append(P, "gpu-none-llvm");
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157438: [OpenMP] Ensure wrapper headers are included on both host and device

2023-08-08 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1190-1191
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {

yaxunl wrote:
> jhuber6 wrote:
> > arsenm wrote:
> > > can we do something better than this NVPTX||AMDGCN checks
> > This is more or less "Are we one of the GPUs `libc` supports". This is for 
> > cross-compiling so there's no existing infrastructure.
> maybe add a variable bool HasGPULibC as it is also used in other places below
I think this is the only use right now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157438/new/

https://reviews.llvm.org/D157438

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157438: [OpenMP] Ensure wrapper headers are included on both host and device

2023-08-08 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1190-1191
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {

arsenm wrote:
> can we do something better than this NVPTX||AMDGCN checks
This is more or less "Are we one of the GPUs `libc` supports". This is for 
cross-compiling so there's no existing infrastructure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157438/new/

https://reviews.llvm.org/D157438

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D157438: [OpenMP] Ensure wrapper headers are included on both host and device

2023-08-08 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, tra, JonChesterfield, 
yaxunl, sivachandra.
Herald added a subscriber: guansong.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1, MaskRay.
Herald added a project: clang.

For the in-progress GPU `libc` project we are relying on overlay headers to
handle the interfacing between the `libc` project and the host `libc`.
We need this to be included on both the host and device so they agree
one what is present on the device, otherwise we will end up with random
errors. For whatever reason this was not being included on the host
although it previously worked. This patch ensures that it's included on
both.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D157438

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/gpu-libc-headers.c


Index: clang/test/Driver/gpu-libc-headers.c
===
--- clang/test/Driver/gpu-libc-headers.c
+++ clang/test/Driver/gpu-libc-headers.c
@@ -8,6 +8,7 @@
 // RUN: -fopenmp-targets=nvptx64-nvidia-cuda 
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70  \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
+// CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" 
"{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RUN: -nogpuinc %s 2>&1 | FileCheck %s 
--check-prefix=CHECK-HEADERS-DISABLED
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -1183,14 +1183,13 @@
   // with ones created by the 'libc' project if present.
   if (!Args.hasArg(options::OPT_nostdinc) &&
   !Args.hasArg(options::OPT_nogpuinc) &&
-  !Args.hasArg(options::OPT_nobuiltininc) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
-
+  !Args.hasArg(options::OPT_nobuiltininc)) {
 // Without an offloading language we will include these headers directly.
 // Offloading languages will instead only use the declarations stored in
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {
   SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
   llvm::sys::path::append(P, "include");
   llvm::sys::path::append(P, "gpu-none-llvm");


Index: clang/test/Driver/gpu-libc-headers.c
===
--- clang/test/Driver/gpu-libc-headers.c
+++ clang/test/Driver/gpu-libc-headers.c
@@ -8,6 +8,7 @@
 // RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70  \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS
 // CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" "{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
+// CHECK-HEADERS: "-cc1"{{.*}}"-internal-isystem" "{{.*}}include{{.*}}llvm_libc_wrappers"{{.*}}"-isysroot" "./"
 
 // RUN:   %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
 // RUN: -nogpuinc %s 2>&1 | FileCheck %s --check-prefix=CHECK-HEADERS-DISABLED
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -1183,14 +1183,13 @@
   // with ones created by the 'libc' project if present.
   if (!Args.hasArg(options::OPT_nostdinc) &&
   !Args.hasArg(options::OPT_nogpuinc) &&
-  !Args.hasArg(options::OPT_nobuiltininc) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
-
+  !Args.hasArg(options::OPT_nobuiltininc)) {
 // Without an offloading language we will include these headers directly.
 // Offloading languages will instead only use the declarations stored in
 // the resource directory at clang/lib/Headers/llvm_libc_wrappers.
-if (C.getActiveOffloadKinds() == Action::OFK_None) {
+if ((getToolChain().getTriple().isNVPTX() ||
+ getToolChain().getTriple().isAMDGCN()) &&
+C.getActiveOffloadKinds() == Action::OFK_None) {
   SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
   llvm::sys::path::append(P, "include");
   llvm::sys::path::append(P, "gpu-none-llvm");
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llv

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156014#4567363 , @steven_wu wrote:

> This breaks macOS bot: 
> https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/36900/testReport/junit/Clang/SemaCUDA/alias_cu/

I should've fixed that already. Is it still broken?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG0ba9aec38faa: [Clang][NVPTX] Permit use of the alias 
attribute for NVPTX targets (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/test/CodeGenCUDA/alias.cu


Index: clang/test/CodeGenCUDA/alias.cu
===
--- clang/test/CodeGenCUDA/alias.cu
+++ clang/test/CodeGenCUDA/alias.cu
@@ -4,17 +4,26 @@
 
 // RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn -emit-llvm \
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm 
-target-sdk-version=10.1 \
+// RUN:   -o - %s | FileCheck %s
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
 
 #include "Inputs/cuda.h"
 
-// Check that we don't generate an alias from "foo" to the mangled name for
-// ns::foo() -- nvptx doesn't support aliases.
-
-namespace ns {
 extern "C" {
-// CHECK-NOT: @foo = internal alias
-__device__ __attribute__((used)) static int foo() { return 0; }
-}
+__device__ int foo() { return 1; }
 }
+
+[[gnu::alias("foo")]] __device__ int alias();
+
+// CHECK: @_Z5aliasv = alias i32 (), ptr @foo
+//
+//  CHECK: define dso_local i32 @foo() #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+//  CHECK:   ret i32 1
+// CHECK-NEXT: }
+
+// RUN: not %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm 
-target-sdk-version=9.0 \
+// RUN:   -o - %s 2>&1 | FileCheck %s --check-prefix=NO_SUPPORT
+// NO_SUPPORT: CUDA older than 10.0 does not support .alias
Index: clang/lib/Sema/SemaDeclAttr.cpp
===
--- clang/lib/Sema/SemaDeclAttr.cpp
+++ clang/lib/Sema/SemaDeclAttr.cpp
@@ -23,6 +23,7 @@
 #include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/AST/Type.h"
 #include "clang/Basic/CharInfo.h"
+#include "clang/Basic/Cuda.h"
 #include "clang/Basic/DarwinSDKInfo.h"
 #include "clang/Basic/HLSLRuntime.h"
 #include "clang/Basic/LangOptions.h"
@@ -1992,8 +1993,12 @@
 S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_darwin);
 return;
   }
+
   if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
+CudaVersion Version =
+ToCudaVersion(S.Context.getTargetInfo().getSDKVersion());
+if (Version != CudaVersion::UNKNOWN && Version < CudaVersion::CUDA_100)
+  S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
   }
 
   // Aliases should be on declarations, not definitions.
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8658,8 +8658,8 @@
 def err_variadic_device_fn : Error<
   "CUDA device code does not support variadic functions">;
 def err_va_arg_in_device : Error<
-  "CUDA device code does not support va_arg">;
-def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">;
+"CUDA device code does not support va_arg">;
+def err_alias_not_supported_on_nvptx : Error<"CUDA older than 10.0 does not 
support .alias">;
 def err_cuda_unattributed_constexpr_cannot_overload_device : Error<
   "constexpr function %0 without __host__ or __device__ attributes cannot "
   "overload __device__ function with same signature.  Add a __host__ "


Index: clang/test/CodeGenCUDA/alias.cu
===
--- clang/test/CodeGenCUDA/alias.cu
+++ clang/test/CodeGenCUDA/alias.cu
@@ -4,17 +4,26 @@
 
 // RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn -emit-llvm \
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm -target-sdk-version=10.1 \
+// RUN:   -o - %s | FileCheck %s
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
 
 #include "Inputs/cuda.h"
 
-// Check that we don't generate an alias from "foo" to the mangled name for
-// ns::foo() -- nvptx doesn't support aliases.
-
-namespace ns {
 extern "C" {
-// CHECK-NOT: @foo = internal alias
-__device__ __attribute__((used)) static int foo() { return 0; }
-}
+__device__ int foo() { return 1; }
 }
+
+[[gnu::alias("foo")]] __device__ int alias();
+
+// CHECK: @_Z5aliasv = alias i32 (), ptr @foo
+//
+//  CHECK: define dso_local i32 @foo() #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+//  CHECK:   ret i32 1
+// CHECK-NEXT: }
+
+// RU

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 547871.
jhuber6 added a comment.

Update to check the SDK version. Permit this if there is not passed in SDK 
version so that freestanding targets can still target CUDA and assume it's 
supported.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/test/CodeGenCUDA/alias.cu


Index: clang/test/CodeGenCUDA/alias.cu
===
--- clang/test/CodeGenCUDA/alias.cu
+++ clang/test/CodeGenCUDA/alias.cu
@@ -4,17 +4,26 @@
 
 // RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn -emit-llvm \
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm 
-target-sdk-version=10.1 \
+// RUN:   -o - %s | FileCheck %s
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
 
 #include "Inputs/cuda.h"
 
-// Check that we don't generate an alias from "foo" to the mangled name for
-// ns::foo() -- nvptx doesn't support aliases.
-
-namespace ns {
 extern "C" {
-// CHECK-NOT: @foo = internal alias
-__device__ __attribute__((used)) static int foo() { return 0; }
-}
+__device__ int foo() { return 1; }
 }
+
+[[gnu::alias("foo")]] __device__ int alias();
+
+// CHECK: @_Z5aliasv = alias i32 (), ptr @foo
+//
+//  CHECK: define dso_local i32 @foo() #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+//  CHECK:   ret i32 1
+// CHECK-NEXT: }
+
+// RUN: not %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm 
-target-sdk-version=9.0 \
+// RUN:   -o - %s 2>&1 | FileCheck %s --check-prefix=NO_SUPPORT
+// NO_SUPPORT: CUDA older than 10.0 does not support .alias
Index: clang/lib/Sema/SemaDeclAttr.cpp
===
--- clang/lib/Sema/SemaDeclAttr.cpp
+++ clang/lib/Sema/SemaDeclAttr.cpp
@@ -23,6 +23,7 @@
 #include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/AST/Type.h"
 #include "clang/Basic/CharInfo.h"
+#include "clang/Basic/Cuda.h"
 #include "clang/Basic/DarwinSDKInfo.h"
 #include "clang/Basic/HLSLRuntime.h"
 #include "clang/Basic/LangOptions.h"
@@ -1992,8 +1993,12 @@
 S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_darwin);
 return;
   }
+
   if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
+CudaVersion Version =
+ToCudaVersion(S.Context.getTargetInfo().getSDKVersion());
+if (Version != CudaVersion::UNKNOWN && Version < CudaVersion::CUDA_100)
+  S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
   }
 
   // Aliases should be on declarations, not definitions.
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8648,8 +8648,8 @@
 def err_variadic_device_fn : Error<
   "CUDA device code does not support variadic functions">;
 def err_va_arg_in_device : Error<
-  "CUDA device code does not support va_arg">;
-def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">;
+"CUDA device code does not support va_arg">;
+def err_alias_not_supported_on_nvptx : Error<"CUDA older than 10.0 does not 
support .alias">;
 def err_cuda_unattributed_constexpr_cannot_overload_device : Error<
   "constexpr function %0 without __host__ or __device__ attributes cannot "
   "overload __device__ function with same signature.  Add a __host__ "


Index: clang/test/CodeGenCUDA/alias.cu
===
--- clang/test/CodeGenCUDA/alias.cu
+++ clang/test/CodeGenCUDA/alias.cu
@@ -4,17 +4,26 @@
 
 // RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn -emit-llvm \
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm -target-sdk-version=10.1 \
+// RUN:   -o - %s | FileCheck %s
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -o - %s | FileCheck %s
 
 #include "Inputs/cuda.h"
 
-// Check that we don't generate an alias from "foo" to the mangled name for
-// ns::foo() -- nvptx doesn't support aliases.
-
-namespace ns {
 extern "C" {
-// CHECK-NOT: @foo = internal alias
-__device__ __attribute__((used)) static int foo() { return 0; }
-}
+__device__ int foo() { return 1; }
 }
+
+[[gnu::alias("foo")]] __device__ int alias();
+
+// CHECK: @_Z5aliasv = alias i32 (), ptr @foo
+//
+//  CHECK: define dso_local i32 @foo() #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: entry:
+//  CHECK:   ret i32 1
+// CHECK-NEXT: }
+
+// RUN: not %clang_cc1 

[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

ping


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17138
+///and use its value for COV_4 or COV_5 approach. It is used for
+///compiling device libraries in ABI-agnostic way.
+///





Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17187-17188
+Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+  } else {
+if (Cov == clang::TargetOptions::COV_5) {
+  // Indexing the implicit kernarg segment.

saiislam wrote:
> jhuber6 wrote:
> > nit.
> There are a couple of common lines after the inner if-else, in the outer else 
> section.
You should be able to factor out
```
LD = CGF.Builder.CreateLoad(
Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
```
from both by making each assign the `Result` to a value.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2550-2551
+Error Err = retrieveAllMemoryPools();
+if (Err)
+  return Plugin::error("Unable to retieve all memmory pools");
+

This and below isn't correct. You can't discard an `llvm::Error` value like 
this without either doing `consumeError(std::move(Err))` or 
`toString(std::move(Err))`. However, you don't need to consume these in the 
first place, they already contain the error message from the callee and should 
just be forwarded.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1752
+// if (auto Err = preAllocateDeviceMemoryPool())
+//   return Err;
+

saiislam wrote:
> jhuber6 wrote:
> > Leftoever?
> No, it is not a left over.
> One of the fields in cov5 implicitikernarg is heap_v1 ptr. It should point to 
> a 128KB zero-initialized block of coarse-grained memory on each device before 
> launching the kernel. This code was working a while ago, but right now it is 
> failing most likely due to some latest change in devicertl memory handling 
> mechanism.
> I need to debug it with this patch, otherwise it will cause all target region 
> code calling device-malloc to fail.
> I will try to fix it before the next revision.
Do we really need that? We only use a fraction of the existing implicit 
arguments. My understanding is that most of these are more for runtime handling 
for HIP and OpenCL while we would most likely want our own solution. I'm 
assuming that the 128KB is not required for anything we use?



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

saiislam wrote:
> jhuber6 wrote:
> > arsenm wrote:
> > > This is getting duplicated a few places, should it move to a support 
> > > header?
> > > 
> > > I don't love the existing APIs for this, I think a struct definition 
> > > makes more sense
> > The other user here is my custom loader, @JonChesterfield has talked about 
> > wanting a common HSA helper header for awhile now.
> > 
> > I agree that the struct definition is much better. Being able to simply 
> > allocate this size and then zero fill it is much cleaner.
> Defining a struct for whole 256 byte of implicitargs in cov5 was becoming a 
> little difficult due to different sizes of various fields (2, 4, 6, 8, 48, 72 
> bytes) along with multiple reserved fields in between. It made sense for cov4 
> because it only had 7 fields of 8 bytes each, where we needed only 4th field 
> in OpenMP runtime (for hostcall_buffer).
> 
> Offset based lookups like the following allows handling/exposing only 
> required fields across generations of ABI.
If we don't use it, just put it as `unused`. It's really hard to read as-is and 
it makes it more difficult to just zero fill.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156928#4562239 , @JonChesterfield 
wrote:

> Or, the front end could define those objects directly, without importing IR 
> files that define the objects with the content clang used to choose the 
> object file. E.g. instead of the argument daz=off (spelled differently) 
> finding a file called daz.off.ll that defines variable called daz with a 
> value 0, that argument could define that variable. I think @jhuber6 has a 
> partial patch trying to do that.
>
> If we were more ambitious, we could use intrinsics that are folded reliably 
> at O0 instead of magic variables that hopefully get constant folded. That 
> would kill a bunch of O0 bugs.
>
> In general though, splicing magic variables in the front end seems unlikely 
> to be performance critical relative to splicing them in at the start of the 
> backend.

I think @saiislam is working on a patch that will handle that. We'll have 
`clang` emit some global that OpenMP uses.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D139730#4561573 , @arsenm wrote:

> In D139730#4561540 , @jhuber6 wrote:
>
>> Could you explain briefly what the approach here is? I'm confused as to 
>> what's actually changed and how we're handling this difference. I thought if 
>> this was just the definition of some builtin function we could just rely on 
>> the backend to figure it out. Why do we need to know the code object version 
>> inside the device RTL?
>
> The build is called in the device rtl, so the device RTL needs to contain 
> both implementations. The "backend figuring it out" is dead code elimination

Okay, do we expect to re-use this interface anywhere? If it's just for OpenMP 
then we should probably copy the approach taken for `__omp_rtl_debug_kind`, 
which is a global created on the GPU by `CGOpenMPRuntimeGPU`'s constructor and 
does more or less the same thing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,

arsenm wrote:
> This is getting duplicated a few places, should it move to a support header?
> 
> I don't love the existing APIs for this, I think a struct definition makes 
> more sense
The other user here is my custom loader, @JonChesterfield has talked about 
wanting a common HSA helper header for awhile now.

I agree that the struct definition is much better. Being able to simply 
allocate this size and then zero fill it is much cleaner.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5

2023-08-04 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Could you explain briefly what the approach here is? I'm confused as to what's 
actually changed and how we're handling this difference. I thought if this was 
just the definition of some builtin function we could just rely on the backend 
to figure it out. Why do we need to know the code object version inside the 
device RTL?




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17140
+
+  if (Cov == clang::TargetOptions::COV_None) {
+auto *ABIVersionC = CGF.CGM.GetOrCreateLLVMGlobal(

Could you explain the function of this in a comment? Are we emitting generic 
code if unspecified?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17187-17188
+Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+  } else {
+if (Cov == clang::TargetOptions::COV_5) {
+  // Indexing the implicit kernarg segment.

nit.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17194
+  // CGBuiltin.cpp ~ line 17052 ~ Value*EmitAMDGPUWorkGroupSize ~ COV:
+  // " << Cov ; llvm::errs().resetColor();
+} else {

Leftover debugging?



Comment at: clang/lib/Driver/ToolChain.cpp:1360
+if (A->getOption().matches(options::OPT_mcode_object_version_EQ))
+  DAL->append(A);
+

Shouldn't we be able to put this under the `OPT_m_group` below?



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1752
+// if (auto Err = preAllocateDeviceMemoryPool())
+//   return Err;
+

Leftoever?



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2542
+  /// Get the address of pointer to the preallocated device memory pool.
+  void **getPreAllocatedDeviceMemoryPool() {
+return &PreAllocatedDeviceMemoryPool;

Why do we need this? The current method shouldn't need to change if all we're 
doing is allocating memory of greater size.



Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:3036
 
+  if (getImplicitArgsSize() < utils::COV5_SIZE) {
+DP("Setting fields of ImplicitArgs for COV4\n");

So we're required to emit some new arguments? I don't have any idea 
what'schanged between this COV4 and COV5 stuff.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156936: [Clang] Increase default architecture from sm_35 to sm_52

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGab202aa7004a: [Clang] Increase default architecture from  
sm_35 to sm_52 (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156936/new/

https://reviews.llvm.org/D156936

Files:
  clang/include/clang/Basic/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c


Index: clang/test/Driver/cuda-cross-compiling.c
===
--- clang/test/Driver/cuda-cross-compiling.c
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -65,9 +65,9 @@
 // RUN: %clang --target=nvptx64-nvidia-cuda -### 
--cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=DEFAULT %s
 
-//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} 
"-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" 
"[[PTX:.+]].s"
-// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} 
"[[CUBIN]].cubin"
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} 
"-target-cpu" "sm_52" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" 
"[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_52" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_52" {{.*}} 
"[[CUBIN]].cubin"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
Index: clang/include/clang/Basic/Cuda.h
===
--- clang/include/clang/Basic/Cuda.h
+++ clang/include/clang/Basic/Cuda.h
@@ -117,7 +117,7 @@
// public one.
   LAST,
 
-  CudaDefault = CudaArch::SM_35,
+  CudaDefault = CudaArch::SM_52,
   HIPDefault = CudaArch::GFX803,
 };
 


Index: clang/test/Driver/cuda-cross-compiling.c
===
--- clang/test/Driver/cuda-cross-compiling.c
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -65,9 +65,9 @@
 // RUN: %clang --target=nvptx64-nvidia-cuda -### --cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=DEFAULT %s
 
-//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_52" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_52" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_52" {{.*}} "[[CUBIN]].cubin"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
Index: clang/include/clang/Basic/Cuda.h
===
--- clang/include/clang/Basic/Cuda.h
+++ clang/include/clang/Basic/Cuda.h
@@ -117,7 +117,7 @@
// public one.
   LAST,
 
-  CudaDefault = CudaArch::SM_35,
+  CudaDefault = CudaArch::SM_52,
   HIPDefault = CudaArch::GFX803,
 };
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156936: [Clang] Increase default architecture from sm_35 to sm_52

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: tra, jdoerfert, yaxunl, jlebar.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

We previously defaulted to `sm_35` for the purpose of unspecified
architecture. This was removed in new CUDA versions so we should bump
this up.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156936

Files:
  clang/include/clang/Basic/Cuda.h
  clang/test/Driver/cuda-cross-compiling.c


Index: clang/test/Driver/cuda-cross-compiling.c
===
--- clang/test/Driver/cuda-cross-compiling.c
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -65,9 +65,9 @@
 // RUN: %clang --target=nvptx64-nvidia-cuda -### 
--cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=DEFAULT %s
 
-//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} 
"-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" 
"[[PTX:.+]].s"
-// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} 
"[[CUBIN]].cubin"
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} 
"-target-cpu" "sm_52" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" 
"[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_52" "--output-file" 
"[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_52" {{.*}} 
"[[CUBIN]].cubin"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
Index: clang/include/clang/Basic/Cuda.h
===
--- clang/include/clang/Basic/Cuda.h
+++ clang/include/clang/Basic/Cuda.h
@@ -117,7 +117,7 @@
// public one.
   LAST,
 
-  CudaDefault = CudaArch::SM_35,
+  CudaDefault = CudaArch::SM_52,
   HIPDefault = CudaArch::GFX803,
 };
 


Index: clang/test/Driver/cuda-cross-compiling.c
===
--- clang/test/Driver/cuda-cross-compiling.c
+++ clang/test/Driver/cuda-cross-compiling.c
@@ -65,9 +65,9 @@
 // RUN: %clang --target=nvptx64-nvidia-cuda -### --cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=DEFAULT %s
 
-//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_35" {{.*}} "[[CUBIN]].cubin"
+//  DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_52" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_52" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_52" {{.*}} "[[CUBIN]].cubin"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
Index: clang/include/clang/Basic/Cuda.h
===
--- clang/include/clang/Basic/Cuda.h
+++ clang/include/clang/Basic/Cuda.h
@@ -117,7 +117,7 @@
// public one.
   LAST,
 
-  CudaDefault = CudaArch::SM_35,
+  CudaDefault = CudaArch::SM_52,
   HIPDefault = CudaArch::GFX803,
 };
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156930: [Clang] Fix Offloading related tests after D156363

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGbc080221b3a2: [Clang] Fix Offloading related tests after 
D156363 (authored by jhuber6).

Changed prior to commit:
  https://reviews.llvm.org/D156930?vs=546564&id=546578#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156930/new/

https://reviews.llvm.org/D156930

Files:
  clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_35.bc
  clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc
  clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_35.bc
  clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc
  clang/test/Driver/amdgpu-hip-system-arch.c
  clang/test/Driver/cuda-bad-arch.cu
  clang/test/Driver/hip-autolink.hip
  clang/test/Driver/hip-binding.hip
  clang/test/Driver/hip-cuid-hash.hip
  clang/test/Driver/hip-cuid.hip
  clang/test/Driver/hip-default-gpu-arch.hip
  clang/test/Driver/hip-device-compile.hip
  clang/test/Driver/hip-host-cpu-features.hip
  clang/test/Driver/hip-launch-api.hip
  clang/test/Driver/hip-link-bc-to-bc.hip
  clang/test/Driver/hip-link-bundle-archive.hip
  clang/test/Driver/hip-no-device-libs.hip
  clang/test/Driver/hip-options.hip
  clang/test/Driver/hip-output-file-name.hip
  clang/test/Driver/hip-printf.hip
  clang/test/Driver/hip-save-temps.hip
  clang/test/Driver/hip-std.hip
  clang/test/Driver/hip-syntax-only.hip
  clang/test/Driver/hip-toolchain-dwarf.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/hip-toolchain-mllvm.hip
  clang/test/Driver/hip-toolchain-opt.hip
  clang/test/Driver/lto.cu
  clang/test/Driver/openmp-offload-gpu.c
  clang/test/Driver/openmp-offload-infer.c

Index: clang/test/Driver/openmp-offload-infer.c
===
--- clang/test/Driver/openmp-offload-infer.c
+++ clang/test/Driver/openmp-offload-infer.c
@@ -2,8 +2,8 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
-// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
-// RUN:  --offload-arch=sm_52 --offload-arch=gfx803 \
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
+// RUN:  -nogpulib --offload-arch=sm_52 --offload-arch=gfx803 \
 // RUN:  --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc \
 // RUN:  --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc %s 2>&1 \
 // RUN:   | FileCheck %s
@@ -39,9 +39,7 @@
 // CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.*]]"
 // CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
 
-// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp=libomp \
-// RUN: --offload-arch=sm_70 --offload-arch=gfx908 --offload-arch=native \
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
+// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
 // RUN: --offload-arch=sm_70 --offload-arch=gfx908 --offload-arch=skylake \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-FAILED
 
Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -10,33 +10,33 @@
 /// ###
 
 /// Check -Xopenmp-target uses one of the archs provided when several archs are used.
-// RUN:   not %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \
-// RUN:  -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \
+// RUN:  -Xopenmp-target -march=sm_52 -Xopenmp-target -march=sm_60 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s
 
 // CHK-FOPENMP-TARGET-ARCHS: ptxas{{.*}}" "--gpu-name" "sm_60"
 
 /// ###
 
-/// Check -Xopenmp-target -march=sm_35 works as expected when two triples are present.
-// RUN:   not %clang -### -fopenmp=libomp \
+/// Check -Xopenmp-target -march=sm_52 works as expected when two triples are present.
+// RUN:   %clang -### -fopenmp=libomp \
 // RUN:  -fopenmp-targets=powerpc64le-ibm-linux-gnu,nvptx64-nvidia-cuda \
-// RUN:  -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_35 %s 2>&1 \
+// RUN:  -nogpulib -nogpuinc -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-COMPILATION %s
 
-// CHK-FOPENMP-TARGET-COMPILATION: ptx

[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156928#4555121 , @yaxunl wrote:

> `-mcode-object-version=none` was intentionally designed to work with `clang 
> -cc1` only, since it does not work with clang driver if users link with 
> device library. Device library can still use it by  using it with `-Xclang`.

If the intended use is the deviceRTL then that should be sufficient.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156928: [Clang][AMDGPU] Fix handling of -mcode-object-version=none arg

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/include/clang/Basic/TargetOptions.h:90
+COV_Default = 400,
+COV_MAX = 500
   };

Typically we just put a `COV_LAST` to indicate that it's over the accepted 
enumerations.



Comment at: clang/lib/Driver/ToolChain.cpp:1364
 // at all, target and host share a toolchain.
 if (A->getOption().matches(options::OPT_m_Group)) {
   if (SameTripleAsHost)

Is this flag not in the `m` group? It should be caught here right?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1058
 unsigned CodeObjVer = getAMDGPUCodeObjectVersion(D, Args);
+if(CodeObjVer != 0) {
 CmdArgs.insert(CmdArgs.begin() + 1,

Use clang-format.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1066
+if (!IsCC1As) {
+  std::string CodeObjVerStr = (CodeObjVer ? Twine(CodeObjVer) : 
"none").str();
   CmdArgs.insert(CmdArgs.begin() + 1,

arsenm wrote:
> don't need to go through std::string? stick with Twine everywhere?
You shouldn't assign to a Twine, but in general I think we should probably put 
this ternary in-line with the other stuff to avoid the temporary.

The handling here is a little confusing, we do
```
Args.getLastArg(options::OPT_mcode_object_version_EQ);
```
Which expects a number, if it's not present we get an empty string which 
default converts to zero which we then convert into "none"?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156928/new/

https://reviews.llvm.org/D156928

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156930: [Clang] Fix Offloading related tests after D156363

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: MaskRay, yaxunl, JonChesterfield, tra, jdoerfert, 
ronlieb, jplehr.
Herald added subscribers: mattd, asavonic, ormris, kerbowa, steven_wu, 
hiraditya, jvesely.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This patch fixes failing tests after checking the return code from the
driver. This is mostly due to the ROCm libraries not being present
during most compilations. Passing `-nogpuinc` should allow us to compile
without it for tests that require it. Additionally, some old tests set
the architecture of Nvidia tests to `sm_35` which is officially
unsupported in CUDA 12+ so it prints an error. We just increase in this
case.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156930

Files:
  clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_35.bc
  clang/test/Driver/Inputs/libomptarget/libomptarget-nvptx-sm_52.bc
  clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_35.bc
  clang/test/Driver/Inputs/libomptarget/subdir/libomptarget-nvptx-sm_52.bc
  clang/test/Driver/amdgpu-hip-system-arch.c
  clang/test/Driver/cuda-bad-arch.cu
  clang/test/Driver/hip-autolink.hip
  clang/test/Driver/hip-binding.hip
  clang/test/Driver/hip-cuid-hash.hip
  clang/test/Driver/hip-cuid.hip
  clang/test/Driver/hip-default-gpu-arch.hip
  clang/test/Driver/hip-device-compile.hip
  clang/test/Driver/hip-host-cpu-features.hip
  clang/test/Driver/hip-launch-api.hip
  clang/test/Driver/hip-link-bc-to-bc.hip
  clang/test/Driver/hip-link-bundle-archive.hip
  clang/test/Driver/hip-no-device-libs.hip
  clang/test/Driver/hip-options.hip
  clang/test/Driver/hip-output-file-name.hip
  clang/test/Driver/hip-printf.hip
  clang/test/Driver/hip-save-temps.hip
  clang/test/Driver/hip-std.hip
  clang/test/Driver/hip-syntax-only.hip
  clang/test/Driver/hip-toolchain-dwarf.hip
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Driver/hip-toolchain-mllvm.hip
  clang/test/Driver/hip-toolchain-opt.hip
  clang/test/Driver/lto.cu
  clang/test/Driver/openmp-offload-gpu.c
  clang/test/Driver/openmp-offload-infer.c

Index: clang/test/Driver/openmp-offload-infer.c
===
--- clang/test/Driver/openmp-offload-infer.c
+++ clang/test/Driver/openmp-offload-infer.c
@@ -2,8 +2,8 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
-// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
-// RUN:  --offload-arch=sm_52 --offload-arch=gfx803 \
+// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
+// RUN:  -nogpulib --offload-arch=sm_52 --offload-arch=gfx803 \
 // RUN:  --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc \
 // RUN:  --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc %s 2>&1 \
 // RUN:   | FileCheck %s
@@ -39,9 +39,7 @@
 // CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.*]]"
 // CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
 
-// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp=libomp \
-// RUN: --offload-arch=sm_70 --offload-arch=gfx908 --offload-arch=native \
-// RUN:   %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
+// RUN:   not %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
 // RUN: --offload-arch=sm_70 --offload-arch=gfx908 --offload-arch=skylake \
 // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-FAILED
 
Index: clang/test/Driver/openmp-offload-gpu.c
===
--- clang/test/Driver/openmp-offload-gpu.c
+++ clang/test/Driver/openmp-offload-gpu.c
@@ -10,33 +10,33 @@
 /// ###
 
 /// Check -Xopenmp-target uses one of the archs provided when several archs are used.
-// RUN:   not %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \
-// RUN:  -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \
+// RUN:  -Xopenmp-target -march=sm_52 -Xopenmp-target -march=sm_60 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s
 
 // CHK-FOPENMP-TARGET-ARCHS: ptxas{{.*}}" "--gpu-name" "sm_60"
 
 /// ###
 
-/// Check -Xopenmp-target -march=sm_35 works as expected when two triples are present.
-// RUN:   not %clang -### -fopenmp=libomp \
+/// Check -Xopenmp-target -march=sm_52 works as expected when two triples are present.
+// RU

[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156363#4554913 , @yaxunl wrote:

> Thanks. I will wait for your patch. You can leave the tricky ones to me if 
> you would like. e.g. the rocm-detect.hip

You can go ahead and fix that one then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156363#4554790 , @ro wrote:

> In D156363#4553043 , @ro wrote:
>
>> It seems the latest commit of this patch has (re-)introduced two failures on 
>> the Solaris/amd64 buildbot 
>> :
>>
>>   FAIL: Clang::clang_f_opts.c
>>   FAIL: Clang::lto.c
>>
>> I cannot really make sense of that.
>
> I think I found it: running the matching '*.script' files under `bash -x` 
> shows the tests ending with:
>
> - for `clang_f_lto.c`:
>
>   + : 'RUN: at line 117'
>   + /var/llvm/local-amd64-debug-stage2/tools/clang/stage2-bins/bin/clang -### 
> -flto -forder-file-instrumentation 
> /vol/llvm/src/llvm-project/local/clang/test/Driver/clang_f_opts.c
>   + /var/llvm/local-amd64-debug-stage2/tools/clang/stage2-bins/bin/FileCheck 
> -check-prefix=CHECK-ORDERFILE-INSTR-LTO 
> /vol/llvm/src/llvm-project/local/clang/test/Driver/clang_f_opts.c
>
> - for `lto.c`, it's similar:
>
>   + : 'RUN: at line 19'
>   + /var/llvm/local-amd64-debug-stage2/tools/clang/stage2-bins/bin/clang 
> /vol/llvm/src/llvm-project/local/clang/test/Driver/lto.c -flto -save-temps 
> -###
>   ro@niers 79 > 
> /var/llvm/local-amd64-debug-stage2/tools/clang/stage2-bins/bin/clan
>
> Manually re-running `clang` gives
>
>   clang: error: 'amd64-pc-solaris2.11': unable to pass LLVM bit-code files to 
> linker
>
> in both cases.

Probably because we're not specifying the `--target=` I'll add that in my fix 
for AMDGPU I'm working on and see if it solves the problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156363#4554687 , @yaxunl wrote:

> In D156363#4554435 , @jhuber6 wrote:
>
>>   Clang :: Driver/amdgpu-hip-system-arch.c
>>   Clang :: Driver/cuda-bad-arch.cu
>>   Clang :: Driver/hip-autolink.hip
>>   Clang :: Driver/hip-binding.hip
>>   Clang :: Driver/hip-cuid-hash.hip
>>   Clang :: Driver/hip-cuid.hip
>>   Clang :: Driver/hip-default-gpu-arch.hip
>>   Clang :: Driver/hip-device-compile.hip
>>   Clang :: Driver/hip-host-cpu-features.hip
>>   Clang :: Driver/hip-launch-api.hip
>>   Clang :: Driver/hip-link-bc-to-bc.hip
>>   Clang :: Driver/hip-link-bundle-archive.hip
>>   Clang :: Driver/hip-no-device-libs.hip
>>   Clang :: Driver/hip-options.hip
>>   Clang :: Driver/hip-output-file-name.hip
>>   Clang :: Driver/hip-printf.hip
>>   Clang :: Driver/hip-save-temps.hip
>>   Clang :: Driver/hip-std.hip
>>   Clang :: Driver/hip-syntax-only.hip
>>   Clang :: Driver/hip-toolchain-dwarf.hip
>>   Clang :: Driver/hip-toolchain-features.hip
>>   Clang :: Driver/hip-toolchain-mllvm.hip
>>   Clang :: Driver/hip-toolchain-opt.hip
>>   Clang :: Driver/lto.cu
>>   Clang :: Driver/openmp-offload-gpu.c
>>   Clang :: Driver/openmp-offload-infer.c
>>   Clang :: Driver/rocm-detect.hip
>>
>> These are the tests I'm seeing fail locally.
>
> These tests fail because the lit tests expect clang driver to run without 
> ROCm.
>
> I will update these tests so that they pass on systems with or w/o ROCm.

I've got a patch half done that mostly adds `-nogpuinc` to all these run lines.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Pretty sure most of this is `hip` returning an error if it can't find `ROCm`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

  Clang :: Driver/amdgpu-hip-system-arch.c
  Clang :: Driver/cuda-bad-arch.cu
  Clang :: Driver/hip-autolink.hip
  Clang :: Driver/hip-binding.hip
  Clang :: Driver/hip-cuid-hash.hip
  Clang :: Driver/hip-cuid.hip
  Clang :: Driver/hip-default-gpu-arch.hip
  Clang :: Driver/hip-device-compile.hip
  Clang :: Driver/hip-host-cpu-features.hip
  Clang :: Driver/hip-launch-api.hip
  Clang :: Driver/hip-link-bc-to-bc.hip
  Clang :: Driver/hip-link-bundle-archive.hip
  Clang :: Driver/hip-no-device-libs.hip
  Clang :: Driver/hip-options.hip
  Clang :: Driver/hip-output-file-name.hip
  Clang :: Driver/hip-printf.hip
  Clang :: Driver/hip-save-temps.hip
  Clang :: Driver/hip-std.hip
  Clang :: Driver/hip-syntax-only.hip
  Clang :: Driver/hip-toolchain-dwarf.hip
  Clang :: Driver/hip-toolchain-features.hip
  Clang :: Driver/hip-toolchain-mllvm.hip
  Clang :: Driver/hip-toolchain-opt.hip
  Clang :: Driver/lto.cu
  Clang :: Driver/openmp-offload-gpu.c
  Clang :: Driver/openmp-offload-infer.c
  Clang :: Driver/rocm-detect.hip

These are the tests I'm seeing fail locally.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156886: [CUDA][HIP] Reorganize options for documentation

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

I think this is good, there's been some confusion around which ones goes to 
what.




Comment at: clang/include/clang/Driver/Options.td:1021
+def offload_host_device : Flag<["--"], "offload-host-device">, 
Flags<[FlangOption]>,
+  HelpText<"Only compile for the offloading host.">;
+

Can you fix this copy paste error while you're here?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156886/new/

https://reviews.llvm.org/D156886

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-08-02 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Also seems to have impacted one of the AMDGPU bots but not the other, not sure 
why https://lab.llvm.org/staging/#/builders/247/builds/4145.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156816#4551500 , @jdoerfert wrote:

> Macros seems to be good enough. If we really need clang attributes, we need 
> new docs, and naming convention etc.

Yeah an alternative would be a new set of attributes, then we make the old ones 
OpenCL language specific. One problem is that this will probably be copy 
pasting everything a second time with no difference, but it might be fine. This 
is just the easy version. Also `addrspace_` probably isn't the best name since 
there's other non-GPU targets that use address spaces. I just think that this 
GPU generic stuff should be moved out of OpenCL in whatever way is most 
convenient.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156816#4551409 , @Anastasia wrote:

> Why not to just use target address space and define it to some macro with 
> desirable spelling?
>
> I don't think renaming OpenCL address space to something else makes sense. It 
> might make more sense to just introduced different model of address spaces 
> completely. But if you use OpenCL ones then it makes sense to have adequate 
> naming so its documentation and etc can be located.

My issue is that these address spaces aren't really OpenCL specific, they 
describe a larger concept than the OpenCL language itself and we'd like to use 
that without needing to invoke the `opencl` name, since it's unrelated in this 
context.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156816#4551338 , @yaxunl wrote:

>> FFI isn't the reason you'd use these, it's for generic access to the actual 
>> backend. E.g. an `addrspace(3)` global is local memory, if it's external 
>> it's dynamic. Having these named is better than doing it via the numerical 
>> address space. I'd like to use these in the C++ / OpenMP codes instead of 
>> the numeric ones but I don't like needing to use `opencl` in the name. 
>> Similarly to how we have the OpenCL atomics that should be usable outside of 
>> OpenCL.
>
> I agree these attributes are useful in other languages, but "global" and 
> "local" may need a more generic name suitable for all offloading languages. 
> To me, "device" can be a good alternative to "global". even "shared" seems 
> clearer than "local".

Global is common in https://llvm.org/docs/AMDGPUUsage.html#address-spaces and 
https://llvm.org/docs/NVPTXUsage.html#address-spaces. The main problem is 
`local` vs `shared` and `private` vs `local`. Unsure which one we should prefer 
in this case. Generally I feel a lot of this OpenCL stuff should've been named 
commonly at the start considering you can use most of them outside of the 
actual OpenCL language just fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D156816#4551299 , @arsenm wrote:

> I don't really see the point of doing this. These introduce ambiguous 
> terminology. The reason you need the attributes is basically for FFI to 
> opencl code, so might as well make the specific meaning clearer with the 
> opencl bit

FFI isn't the reason you'd use these, it's for generic access to the actual 
backend. E.g. an `addrspace(3)` global is local memory, if it's external it's 
dynamic. Having these named is better than doing it via the numerical address 
space. I'd like to use these in the C++ / OpenMP codes instead of the numeric 
error codes but I don't like needing to use `opencl` in the name. Similarly to 
how we have the OpenCL atomics that should be usable outside of OpenCL.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 546149.
jhuber6 added a comment.

Fix typo


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156816/new/

https://reviews.llvm.org/D156816

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td


Index: clang/include/clang/Basic/AttrDocs.td
===
--- clang/include/clang/Basic/AttrDocs.td
+++ clang/include/clang/Basic/AttrDocs.td
@@ -3818,7 +3818,7 @@
 
 def OpenCLAddressSpaceGenericDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__generic, generic, [[clang::opencl_generic]]";
+  let Heading = "__generic, generic, [[clang::addrspace_generic]], 
[[clang::opencl_generic]]";
   let Content = [{
 The generic address space attribute is only available with OpenCL v2.0 and 
later.
 It can be used with pointer types. Variables in global and local scope and
@@ -3831,7 +3831,7 @@
 
 def OpenCLAddressSpaceConstantDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__constant, constant, [[clang::opencl_constant]]";
+  let Heading = "__constant, constant, [[clang::addrspace_constant]], 
[[clang::opencl_constant]]";
   let Content = [{
 The constant address space attribute signals that an object is located in
 a constant (non-modifiable) memory region. It is available to all work items.
@@ -3843,7 +3843,7 @@
 
 def OpenCLAddressSpaceGlobalDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__global, global, [[clang::opencl_global]]";
+  let Heading = "__global, global, [[clang::addrspace_global]], 
[[clang::opencl_global]]";
   let Content = [{
 The global address space attribute specifies that an object is allocated in
 global memory, which is accessible by all work items. The content stored in 
this
@@ -3881,7 +3881,7 @@
 
 def OpenCLAddressSpaceLocalDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__local, local, [[clang::opencl_local]]";
+  let Heading = "__local, local, [[clang::addrspace_local]], 
[[clang::opencl_local]]";
   let Content = [{
 The local address space specifies that an object is allocated in the local 
(work
 group) memory area, which is accessible to all work items in the same work
@@ -3894,7 +3894,7 @@
 
 def OpenCLAddressSpacePrivateDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__private, private, [[clang::opencl_private]]";
+  let Heading = "__private, private, [[clang::addrspace_private]], 
[[clang::opencl_private]]";
   let Content = [{
 The private address space specifies that an object is allocated in the private
 (work item) memory. Other work items cannot access the same memory area and its
Index: clang/include/clang/Basic/Attr.td
===
--- clang/include/clang/Basic/Attr.td
+++ clang/include/clang/Basic/Attr.td
@@ -1344,13 +1344,13 @@
 
 def OpenCLPrivateAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__private">, CustomKeyword<"private">,
-   Clang<"opencl_private">];
+   Clang<"opencl_private">, Clang<"addrspace_private">];
   let Documentation = [OpenCLAddressSpacePrivateDocs];
 }
 
 def OpenCLGlobalAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__global">, CustomKeyword<"global">,
-   Clang<"opencl_global">];
+   Clang<"opencl_global">, Clang<"addrspace_global">];
   let Documentation = [OpenCLAddressSpaceGlobalDocs];
 }
 
@@ -1366,19 +1366,19 @@
 
 def OpenCLLocalAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__local">, CustomKeyword<"local">,
-   Clang<"opencl_local">];
+   Clang<"opencl_local", Clang<"addrspace_local">>];
   let Documentation = [OpenCLAddressSpaceLocalDocs];
 }
 
 def OpenCLConstantAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__constant">, CustomKeyword<"constant">,
-   Clang<"opencl_constant">];
+   Clang<"opencl_constant", Clang<"addrspace_constant">>];
   let Documentation = [OpenCLAddressSpaceConstantDocs];
 }
 
 def OpenCLGenericAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__generic">, CustomKeyword<"generic">,
-   Clang<"opencl_generic">];
+   Clang<"opencl_generic", Clang<"addrspace_generic">>];
   let Documentation = [OpenCLAddressSpaceGenericDocs];
 }
 


Index: clang/include/clang/Basic/AttrDocs.td
===
--- clang/include/clang/Basic/AttrDocs.td
+++ clang/include/clang/Basic/AttrDocs.td
@@ -3818,7 +3818,7 @@
 
 def OpenCLAddressSpaceGenericDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__generic, generic, [[clang::opencl_generic]]";
+  let Heading = "__generic, generic, [[clang::addrspace_generic]], [[clang::opencl_generic]]";
   let C

[PATCH] D156816: [Clang] Make generic aliases to OpenCL address spaces

2023-08-01 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: ebevhan, arsenm, JonChesterfield, jdoerfert, 
tianshilei1992, tra, yaxunl, rjmccall.
Herald added subscribers: jeroen.dobbelaere, Naghasan, ldrumm, arichardson, 
Anastasia.
Herald added a reviewer: aaron.ballman.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, wangpc, wdng.
Herald added a project: clang.

We provide these OpenCL attributes for specifying address spaces. These
more or less map directly to the versions described by the backend
documentation. However, when using these for more generic reasons, the
`opencl` keyword doesn't help since this is more tied to the backend.
This patch adds an alias to just call it `addrspace_` instead of
`opencl_`.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156816

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td


Index: clang/include/clang/Basic/AttrDocs.td
===
--- clang/include/clang/Basic/AttrDocs.td
+++ clang/include/clang/Basic/AttrDocs.td
@@ -3818,7 +3818,7 @@
 
 def OpenCLAddressSpaceGenericDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__generic, generic, [[clang::opencl_generic]]";
+  let Heading = "__generic, generic, [[clang::addrspace_generic]], 
[[clang::opencl_generic]]";
   let Content = [{
 The generic address space attribute is only available with OpenCL v2.0 and 
later.
 It can be used with pointer types. Variables in global and local scope and
@@ -3831,7 +3831,7 @@
 
 def OpenCLAddressSpaceConstantDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__constant, constant, [[clang::opencl_constant]]";
+  let Heading = "__constant, constant, [[clang::addrspace_constant]], 
[[clang::opencl_constant]]";
   let Content = [{
 The constant address space attribute signals that an object is located in
 a constant (non-modifiable) memory region. It is available to all work items.
@@ -3843,7 +3843,7 @@
 
 def OpenCLAddressSpaceGlobalDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__global, global, [[clang::opencl_global]]";
+  let Heading = "__global, global, [[clang::addrspace_global]], 
[[clang::opencl_global]]";
   let Content = [{
 The global address space attribute specifies that an object is allocated in
 global memory, which is accessible by all work items. The content stored in 
this
@@ -3881,7 +3881,7 @@
 
 def OpenCLAddressSpaceLocalDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__local, local, [[clang::opencl_local]]";
+  let Heading = "__local, local, [[clang::addrspace_local]], 
[[clang::opencl_local]]";
   let Content = [{
 The local address space specifies that an object is allocated in the local 
(work
 group) memory area, which is accessible to all work items in the same work
@@ -3894,7 +3894,7 @@
 
 def OpenCLAddressSpacePrivateDocs : Documentation {
   let Category = DocOpenCLAddressSpaces;
-  let Heading = "__private, private, [[clang::opencl_private]]";
+  let Heading = "__private, private, [[clang::addrspace_private]], 
[[clang::opencl_private]]";
   let Content = [{
 The private address space specifies that an object is allocated in the private
 (work item) memory. Other work items cannot access the same memory area and its
Index: clang/include/clang/Basic/Attr.td
===
--- clang/include/clang/Basic/Attr.td
+++ clang/include/clang/Basic/Attr.td
@@ -1344,13 +1344,13 @@
 
 def OpenCLPrivateAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__private">, CustomKeyword<"private">,
-   Clang<"opencl_private">];
-  let Documentation = [OpenCLAddressSpacePrivateDocs];
+   Clang<"opencl_private">, Clang<"addrspace_private">];
+  let Documentation = [kpenCLAddressSpacePrivateDocs];
 }
 
 def OpenCLGlobalAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__global">, CustomKeyword<"global">,
-   Clang<"opencl_global">];
+   Clang<"opencl_global">, Clang<"addrspace_global">];
   let Documentation = [OpenCLAddressSpaceGlobalDocs];
 }
 
@@ -1366,19 +1366,19 @@
 
 def OpenCLLocalAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__local">, CustomKeyword<"local">,
-   Clang<"opencl_local">];
+   Clang<"opencl_local", Clang<"addrspace_local">>];
   let Documentation = [OpenCLAddressSpaceLocalDocs];
 }
 
 def OpenCLConstantAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__constant">, CustomKeyword<"constant">,
-   Clang<"opencl_constant">];
+   Clang<"opencl_constant", Clang<"addrspace_constant">>];
   let Documentation = [OpenCLAddressSpaceConstantDocs];
 }
 
 def OpenCLGenericAddressSpace : TypeAttr {
   let Spellings = [CustomKeyword<"__generic">, Custo

[PATCH] D156368: [OpenMP] Do not always emit unused extern variables

2023-07-28 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG141c4e7a9403: [OpenMP] Do not always emit unused extern 
variables (authored by jhuber6).
Herald added a project: OpenMP.
Herald added a subscriber: openmp-commits.

Changed prior to commit:
  https://reviews.llvm.org/D156368?vs=545168&id=545211#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156368/new/

https://reviews.llvm.org/D156368

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/declare_target_codegen.cpp
  openmp/libomptarget/test/offloading/extern.c


Index: openmp/libomptarget/test/offloading/extern.c
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/extern.c
@@ -0,0 +1,27 @@
+// RUN: %libomptarget-compile-generic -DVAR -c -o %t.o
+// RUN: %libomptarget-compile-generic %t.o && %libomptarget-run-generic | 
%fcheck-generic
+
+#ifdef VAR
+int x = 1;
+#else
+#include 
+#include 
+extern int x;
+
+int main() {
+  int value = 0;
+#pragma omp target map(from : value)
+  value = x;
+  assert(value == 1);
+
+  x = 999;
+#pragma omp target update to(x)
+
+#pragma omp target map(from : value)
+  value = x;
+  assert(value == 999);
+
+  // CHECK: PASS
+  printf ("PASS\n");
+}
+#endif
Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,
 // CHECK-DAG: @dy = {{protected | }}global i32 0,
-// CHECK-DAG: @aaa = external global i32,
 // CHECK-DAG: @bbb = {{protected | }}global i32 0,
 // CHECK-DAG: weak constant %struct.__tgt_offload_entry { ptr @bbb,
 // CHECK-DAG: @ccc = external global i32,
@@ -80,7 +79,7 @@
 extern int aaa;
 int bbb = 0;
 extern int ccc;
-int ddd = 0;
+int ddd = ccc;
 #pragma omp end declare target
 
 #pragma omp declare target
@@ -260,8 +259,6 @@
 
 // CHECK-NOT: define {{.*}}{{baz1|baz4|maini1|Base|virtual_}}
 
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// CHECK-DAG: !{i32 1, !"ccc", i32 0, i32 {{[0-9]+}}}
 // CHECK-DAG: !{{{.+}}virtual_foo
 
 #ifdef OMP5
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3605,6 +3605,13 @@
 // Emit declaration of the must-be-emitted declare target variable.
 if (std::optional Res =
 OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {
+
+  // If this variable has external storage and doesn't require special
+  // link handling we defer to its canonical definition.
+  if (VD->hasExternalStorage() &&
+  Res != OMPDeclareTargetDeclAttr::MT_Link)
+return;
+
   bool UnifiedMemoryEnabled =
   getOpenMPRuntime().hasRequiresUnifiedSharedMemory();
   if ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10166,6 +10166,13 @@
 
   std::optional Res =
   OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
+
+  // If this is an 'extern' declaration we defer to the canonical definition 
and
+  // do not emit an offloading entry.
+  if (Res && *Res != OMPDeclareTargetDeclAttr::MT_Link &&
+  VD->hasExternalStorage())
+return;
+
   if (!Res) {
 if (CGM.getLangOpts().OpenMPIsTargetDevice) {
   // Register non-target variables being emitted in device code (debug info


Index: openmp/libomptarget/test/offloading/extern.c
===
--- /dev/null
+++ openmp/libomptarget/test/offloading/extern.c
@@ -0,0 +1,27 @@
+// RUN: %libomptarget-compile-generic -DVAR -c -o %t.o
+// RUN: %libomptarget-compile-generic %t.o && %libomptarget-run-generic | %fcheck-generic
+
+#ifdef VAR
+int x = 1;
+#else
+#include 
+#include 
+extern int x;
+
+int main() {
+  int value = 0;
+#pragma omp target map(from : value)
+  value = x;
+  assert(value == 1);
+
+  x = 999;
+#pragma omp target update to(x)
+
+#pragma omp target map(from : value)
+  value = x;
+  assert(value == 999);
+
+  // CHECK: PASS
+  printf ("PASS\n");
+}
+#endif
Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,

[PATCH] D156368: [OpenMP] Do not always emit unused extern variables

2023-07-28 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/test/OpenMP/declare_target_codegen.cpp:264
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// CHECK-DAG: !{i32 1, !"ccc", i32 0, i32 {{[0-9]+}}}
 // CHECK-DAG: !{{{.+}}virtual_foo

tianshilei1992 wrote:
> Since `ccc` is used here, it is not supposed to be removed right?
This is the declare target metadta, which this patch no longer emits for 
external storage variables. Since we assume that whoever defined `ccc` will 
handle it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156368/new/

https://reviews.llvm.org/D156368

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156368: [OpenMP] Do not always emit unused extern variables

2023-07-28 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 545168.
jhuber6 added a comment.

Add OpenMP runtime test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156368/new/

https://reviews.llvm.org/D156368

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/declare_target_codegen.cpp


Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,
 // CHECK-DAG: @dy = {{protected | }}global i32 0,
-// CHECK-DAG: @aaa = external global i32,
 // CHECK-DAG: @bbb = {{protected | }}global i32 0,
 // CHECK-DAG: weak constant %struct.__tgt_offload_entry { ptr @bbb,
 // CHECK-DAG: @ccc = external global i32,
@@ -80,7 +79,7 @@
 extern int aaa;
 int bbb = 0;
 extern int ccc;
-int ddd = 0;
+int ddd = ccc;
 #pragma omp end declare target
 
 #pragma omp declare target
@@ -260,8 +259,6 @@
 
 // CHECK-NOT: define {{.*}}{{baz1|baz4|maini1|Base|virtual_}}
 
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// CHECK-DAG: !{i32 1, !"ccc", i32 0, i32 {{[0-9]+}}}
 // CHECK-DAG: !{{{.+}}virtual_foo
 
 #ifdef OMP5
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3605,6 +3605,13 @@
 // Emit declaration of the must-be-emitted declare target variable.
 if (std::optional Res =
 OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {
+
+  // If this variable has external storage and doesn't require special
+  // link handling we defer to its canonical definition.
+  if (VD->hasExternalStorage() &&
+  Res != OMPDeclareTargetDeclAttr::MT_Link)
+return;
+
   bool UnifiedMemoryEnabled =
   getOpenMPRuntime().hasRequiresUnifiedSharedMemory();
   if ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10166,6 +10166,13 @@
 
   std::optional Res =
   OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
+
+  // If this is an 'extern' declaration we defer to the canonical definition 
and
+  // do not emit an offloading entry.
+  if (Res && *Res != OMPDeclareTargetDeclAttr::MT_Link &&
+  VD->hasExternalStorage())
+return;
+
   if (!Res) {
 if (CGM.getLangOpts().OpenMPIsTargetDevice) {
   // Register non-target variables being emitted in device code (debug info


Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,
 // CHECK-DAG: @dy = {{protected | }}global i32 0,
-// CHECK-DAG: @aaa = external global i32,
 // CHECK-DAG: @bbb = {{protected | }}global i32 0,
 // CHECK-DAG: weak constant %struct.__tgt_offload_entry { ptr @bbb,
 // CHECK-DAG: @ccc = external global i32,
@@ -80,7 +79,7 @@
 extern int aaa;
 int bbb = 0;
 extern int ccc;
-int ddd = 0;
+int ddd = ccc;
 #pragma omp end declare target
 
 #pragma omp declare target
@@ -260,8 +259,6 @@
 
 // CHECK-NOT: define {{.*}}{{baz1|baz4|maini1|Base|virtual_}}
 
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// CHECK-DAG: !{i32 1, !"ccc", i32 0, i32 {{[0-9]+}}}
 // CHECK-DAG: !{{{.+}}virtual_foo
 
 #ifdef OMP5
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3605,6 +3605,13 @@
 // Emit declaration of the must-be-emitted declare target variable.
 if (std::optional Res =
 OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {
+
+  // If this variable has external storage and doesn't require special
+  // link handling we defer to its canonical definition.
+  if (VD->hasExternalStorage() &&
+  Res != OMPDeclareTargetDeclAttr::MT_Link)
+return;
+
   bool UnifiedMemoryEnabled =
   getOpenMPRuntime().hasRequiresUnifiedSharedMemory();
   if ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10166,6 +10166,13 @@
 
   std::optional Res =
   OMPDe

[PATCH] D156363: [Driver] -###: exit with code 1 if hasErrorOccurred

2023-07-27 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.

I'd wager a lot of the tests that return non-zero aren't even intentional, so 
it's probably good to enforce this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156363/new/

https://reviews.llvm.org/D156363

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156426: [HIP] link HIP runtime library without --hip-link

2023-07-27 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

LG, thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156426/new/

https://reviews.llvm.org/D156426

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156426: [HIP] link HIP runtime library without --hip-link

2023-07-27 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/include/clang/Driver/Driver.h:712
+  /// Whether there are HIP input files.
+  bool hasHIPInputs() const { return HasHIPInputs; }
+

Shouldn't we have access to the compilation? I figured we could check 
`C.getActiveOffloadKinds()` or w/e it's called.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156426/new/

https://reviews.llvm.org/D156426

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156426: [HIP] link HIP runtime library without --hip-link

2023-07-27 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

So this is equivalent to `nvcc` implicitly calling `-lcudart`? I've had 
thoughts about the `clang-linker-wrapper` adding known runtime flags to the 
link job if it's not found. E.g. if we find a CUDA image we pass `-lcudart`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156426/new/

https://reviews.llvm.org/D156426

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156368: [OpenMP] Do not always emit unused extern variables

2023-07-26 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: JonChesterfield, jdoerfert, tianshilei1992, ye-luo, 
RaviNarayanaswamy, ABataev.
Herald added subscribers: sunshaoce, guansong, yaxunl.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, jplehr, sstefan1.
Herald added a project: clang.

Currently, the precense of the OpenMP target declare metadata requires
that we always codegen a global declaration. This is undesirable in the
case that we could defer or omit this declaration as is common with
unused extern variables. This is important as it allows us, in the
runtime, to rely on static linking semantics to omit unused symbols so
they are not included when the user links it in.

This patch changes the check for always emitting these variables.
Because of this we also need to extend this logic to the generation of
the offloading entries. This has the result of derring the offload entry
generation to the canonical definitoin. So we are effectively assuming
whoever owns the storage for this variable will perform that operation.
This makes an exception for `link` attributes as those require their own
special handling.

Let me know if this is sound in the implementation, I do not have the
largest view of the standards here.

Fixes: https://github.com/llvm/llvm-project/issues/64133


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156368

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/OpenMP/declare_target_codegen.cpp


Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,
 // CHECK-DAG: @dy = {{protected | }}global i32 0,
-// CHECK-DAG: @aaa = external global i32,
 // CHECK-DAG: @bbb = {{protected | }}global i32 0,
 // CHECK-DAG: weak constant %struct.__tgt_offload_entry { ptr @bbb,
 // CHECK-DAG: @ccc = external global i32,
@@ -80,7 +79,7 @@
 extern int aaa;
 int bbb = 0;
 extern int ccc;
-int ddd = 0;
+int ddd = ccc;
 #pragma omp end declare target
 
 #pragma omp declare target
@@ -260,8 +259,6 @@
 
 // CHECK-NOT: define {{.*}}{{baz1|baz4|maini1|Base|virtual_}}
 
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// CHECK-DAG: !{i32 1, !"ccc", i32 0, i32 {{[0-9]+}}}
 // CHECK-DAG: !{{{.+}}virtual_foo
 
 #ifdef OMP5
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3605,6 +3605,13 @@
 // Emit declaration of the must-be-emitted declare target variable.
 if (std::optional Res =
 OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {
+
+  // If this variable has external storage and doesn't require special
+  // link handling we defer to its canonical definition.
+  if (VD->hasExternalStorage() &&
+  Res != OMPDeclareTargetDeclAttr::MT_Link)
+return;
+
   bool UnifiedMemoryEnabled =
   getOpenMPRuntime().hasRequiresUnifiedSharedMemory();
   if ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10166,6 +10166,13 @@
 
   std::optional Res =
   OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
+
+  // If this is an 'extern' declaration we defer to the canonical definition 
and
+  // do not emit an offloading entry.
+  if (Res && *Res != OMPDeclareTargetDeclAttr::MT_Link &&
+  VD->hasExternalStorage())
+return;
+
   if (!Res) {
 if (CGM.getLangOpts().OpenMPIsTargetDevice) {
   // Register non-target variables being emitted in device code (debug info


Index: clang/test/OpenMP/declare_target_codegen.cpp
===
--- clang/test/OpenMP/declare_target_codegen.cpp
+++ clang/test/OpenMP/declare_target_codegen.cpp
@@ -29,7 +29,6 @@
 // CHECK-DAG: @flag = protected global i8 undef,
 // CHECK-DAG: @dx = {{protected | }}global i32 0,
 // CHECK-DAG: @dy = {{protected | }}global i32 0,
-// CHECK-DAG: @aaa = external global i32,
 // CHECK-DAG: @bbb = {{protected | }}global i32 0,
 // CHECK-DAG: weak constant %struct.__tgt_offload_entry { ptr @bbb,
 // CHECK-DAG: @ccc = external global i32,
@@ -80,7 +79,7 @@
 extern int aaa;
 int bbb = 0;
 extern int ccc;
-int ddd = 0;
+int ddd = ccc;
 #pragma omp end declare target
 
 #pragma omp declare target
@@ -260,8 +259,6 @@
 
 // CHECK-NOT: define {{.*}}{{baz1|baz4|maini1|Base|virtual_}}
 
-// CHECK-DAG: !{i32 1, !"aaa", i32 0, i32 {{[0-9]+}}}
-// 

[PATCH] D156366: HIP: Use __builtin_sqrt instead of routing through ocml sqrt for f64

2023-07-26 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

Thanks


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156366/new/

https://reviews.llvm.org/D156366

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-07-24 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Sema/SemaDeclAttr.cpp:1995
   }
-  if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);

jhuber6 wrote:
> tra wrote:
> > tra wrote:
> > > Allowing or not `noreturn` depends on the CUDA version we're building 
> > > with (or rather on the PTX version we need for .noreturn instruction).
> > > 
> > > We would still need to issue the diagnostics if we're using CUDA older 
> > > than 10.1.
> > > 
> > Make it `.alias` and `CUDA older than 10.0`.
> Do we do any similar diagnostics checks on the CUDA version? I thought that 
> was more of a clang driver thing and we'd just let the backend handle the 
> failure, since we can emit LLVM-IR that can be compiled irrespective of the 
> CUDA version used to make it.
I checked and I don't think we pass in any CUDA version information to the 
`-cc1` compiler. In this case if the user didn't have sufficient utilities it 
would simply fail in the backend or in PTX. We have semi-helpful messages there 
and it would be a good indicator to update CUDA. Is this fine given that?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-07-21 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Sema/SemaDeclAttr.cpp:1995
   }
-  if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);

tra wrote:
> tra wrote:
> > Allowing or not `noreturn` depends on the CUDA version we're building with 
> > (or rather on the PTX version we need for .noreturn instruction).
> > 
> > We would still need to issue the diagnostics if we're using CUDA older than 
> > 10.1.
> > 
> Make it `.alias` and `CUDA older than 10.0`.
Do we do any similar diagnostics checks on the CUDA version? I thought that was 
more of a clang driver thing and we'd just let the backend handle the failure, 
since we can emit LLVM-IR that can be compiled irrespective of the CUDA version 
used to make it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156014/new/

https://reviews.llvm.org/D156014

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D156014: [Clang][NVPTX] Permit use of the alias attribute for NVPTX targets

2023-07-21 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: tra, arsenm, jlebar, kushanam, aaron.ballman, yaxunl, 
jdoerfert.
Herald added subscribers: mattd, gchakrabarti, asavonic, jeroen.dobbelaere.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, wangpc, wdng.
Herald added a project: clang.

The patch in D155211  added basic support for 
the `.alias` keyword in
PTX. This means we should be able to permit use of this in clang.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156014

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/test/SemaCUDA/alias.cu


Index: clang/test/SemaCUDA/alias.cu
===
--- clang/test/SemaCUDA/alias.cu
+++ /dev/null
@@ -1,11 +0,0 @@
-// RUN: %clang_cc1 -triple nvptx-unknown-cuda -fsyntax-only -fcuda-is-device 
-verify -DEXPECT_ERR %s
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
-
-// The alias attribute is not allowed in CUDA device code.
-void bar();
-__attribute__((alias("bar"))) void foo();
-#ifdef EXPECT_ERR
-// expected-error@-2 {{CUDA does not support aliases}}
-#else
-// expected-no-diagnostics
-#endif
Index: clang/lib/Sema/SemaDeclAttr.cpp
===
--- clang/lib/Sema/SemaDeclAttr.cpp
+++ clang/lib/Sema/SemaDeclAttr.cpp
@@ -1992,9 +1992,6 @@
 S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_darwin);
 return;
   }
-  if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
-  }
 
   // Aliases should be on declarations, not definitions.
   if (const auto *FD = dyn_cast(D)) {
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8646,7 +8646,6 @@
   "CUDA device code does not support variadic functions">;
 def err_va_arg_in_device : Error<
   "CUDA device code does not support va_arg">;
-def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">;
 def err_cuda_unattributed_constexpr_cannot_overload_device : Error<
   "constexpr function %0 without __host__ or __device__ attributes cannot "
   "overload __device__ function with same signature.  Add a __host__ "


Index: clang/test/SemaCUDA/alias.cu
===
--- clang/test/SemaCUDA/alias.cu
+++ /dev/null
@@ -1,11 +0,0 @@
-// RUN: %clang_cc1 -triple nvptx-unknown-cuda -fsyntax-only -fcuda-is-device -verify -DEXPECT_ERR %s
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
-
-// The alias attribute is not allowed in CUDA device code.
-void bar();
-__attribute__((alias("bar"))) void foo();
-#ifdef EXPECT_ERR
-// expected-error@-2 {{CUDA does not support aliases}}
-#else
-// expected-no-diagnostics
-#endif
Index: clang/lib/Sema/SemaDeclAttr.cpp
===
--- clang/lib/Sema/SemaDeclAttr.cpp
+++ clang/lib/Sema/SemaDeclAttr.cpp
@@ -1992,9 +1992,6 @@
 S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_darwin);
 return;
   }
-  if (S.Context.getTargetInfo().getTriple().isNVPTX()) {
-S.Diag(AL.getLoc(), diag::err_alias_not_supported_on_nvptx);
-  }
 
   // Aliases should be on declarations, not definitions.
   if (const auto *FD = dyn_cast(D)) {
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8646,7 +8646,6 @@
   "CUDA device code does not support variadic functions">;
 def err_va_arg_in_device : Error<
   "CUDA device code does not support va_arg">;
-def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">;
 def err_cuda_unattributed_constexpr_cannot_overload_device : Error<
   "constexpr function %0 without __host__ or __device__ attributes cannot "
   "overload __device__ function with same signature.  Add a __host__ "
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155211: [NVPTX] Add initial support for '.alias' in PTX

2023-07-21 Thread Joseph Huber via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf4381d464457: [NVPTX] Add initial support for 
'.alias' in PTX (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155211/new/

https://reviews.llvm.org/D155211

Files:
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
  llvm/test/CodeGen/NVPTX/alias-errors.ll
  llvm/test/CodeGen/NVPTX/alias.ll

Index: llvm/test/CodeGen/NVPTX/alias.ll
===
--- llvm/test/CodeGen/NVPTX/alias.ll
+++ llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,7 +1,27 @@
-; RUN: not --crash llc < %s -march=nvptx -mcpu=sm_20 2>&1 | FileCheck %s
-
-; Check that llc dies gracefully when given an alias.
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 | FileCheck %s
 
 define i32 @a() { ret i32 0 }
-; CHECK: ERROR: Module has aliases
 @b = internal alias i32 (), ptr @a
+@c = internal alias i32 (), ptr @a
+
+define void @foo(i32 %0, ptr %1) { ret void }
+@bar = alias i32 (), ptr @foo
+
+; CHECK: .visible .func  (.param .b32 func_retval0) a()
+
+;  CHECK: .visible .func foo(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: )
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) b();
+; CHECK-NEXT: .alias b, a;
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) c();
+; CHECK-NEXT: .alias c, a;
+
+;  CHECK: .visible .func bar(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: );
+; CHECK-NEXT: .alias bar, foo;
Index: llvm/test/CodeGen/NVPTX/alias-errors.ll
===
--- /dev/null
+++ llvm/test/CodeGen/NVPTX/alias-errors.ll
@@ -0,0 +1,9 @@
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx43 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_20 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ALIAS
+
+; ATTR: .alias requires PTX version >= 6.3 and sm_30
+
+; ALIAS: NVPTX aliasee must be a non-kernel function
+@a = global i32 42, align 8
+@b = internal alias i32, ptr @a
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
@@ -174,6 +174,7 @@
   void printModuleLevelGV(const GlobalVariable *GVar, raw_ostream &O,
   bool processDemoted, const NVPTXSubtarget &STI);
   void emitGlobals(const Module &M);
+  void emitGlobalAlias(const Module &M, const GlobalAlias &GA);
   void emitHeader(Module &M, raw_ostream &O, const NVPTXSubtarget &STI);
   void emitKernelFunctionDirectives(const Function &F, raw_ostream &O) const;
   void emitVirtualRegister(unsigned int vr, raw_ostream &);
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -473,6 +473,7 @@
   CurrentFnSym->print(O, MAI);
 
   emitFunctionParamList(F, O);
+  O << "\n";
 
   if (isKernelFunction(*F))
 emitKernelFunctionDirectives(*F, O);
@@ -623,6 +624,7 @@
   getSymbol(F)->print(O, MAI);
   O << "\n";
   emitFunctionParamList(F, O);
+  O << "\n";
   if (shouldEmitPTXNoReturn(F, TM))
 O << ".noreturn";
   O << ";\n";
@@ -790,10 +792,12 @@
 }
 
 bool NVPTXAsmPrinter::doInitialization(Module &M) {
-  if (M.alias_size()) {
-report_fatal_error("Module has aliases, which NVPTX does not support.");
-return true; // error
-  }
+  const NVPTXTargetMachine &NTM = static_cast(TM);
+  const NVPTXSubtarget &STI =
+  *static_cast(NTM.getSubtargetImpl());
+  if (M.alias_size() && (STI.getPTXVersion() < 63 || STI.getSmVersion() < 30))
+report_fatal_error(".alias requires PTX version >= 6.3 and sm_30");
+
   if (!isEmptyXXStructor(M.getNamedGlobal("llvm.global_ctors")) &&
   !LowerCtorDtor) {
 report_fatal_error(
@@ -850,6 +854,32 @@
   OutStreamer->emitRawText(OS2.str());
 }
 
+void NVPTXAsmPrinter::emitGlobalAlias(const Module &M, const GlobalAlias &GA) {
+  SmallString<128> Str;
+  raw_svector_ostream OS(Str);
+
+  MCSymbol *Name = getSymbol(&GA);
+  const Function *F = dyn_cast(GA.getAliasee());
+  if (!F || isKernelFunction(*F))
+report_fatal_error("NVPTX aliasee must be a non-kernel function");
+
+  if (GA.hasLinkOnceLinkage() || GA.hasWeakLinkage() ||
+  GA.hasAvailableExternallyLinkage() || GA.hasCommonLinkage())
+report_fatal_error("NVPTX aliasee must not be '.weak'");
+
+  OS << "\n";
+  emitLinkageDirective(F, OS);
+  OS << ".func ";
+  printReturnValStr(F, OS);
+  OS << Name->getName();
+  emitFunctionParamList(F, OS)

[PATCH] D155727: [OpenMP][Docs] Add some things to the OpenMP support

2023-07-19 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa37d74722254: [OpenMP][Docs] Add some things to the OpenMP 
support (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155727/new/

https://reviews.llvm.org/D155727

Files:
  clang/docs/OpenMPSupport.rst


Index: clang/docs/OpenMPSupport.rst
===
--- clang/docs/OpenMPSupport.rst
+++ clang/docs/OpenMPSupport.rst
@@ -186,7 +186,7 @@
 
+--+--+--+---+
 | device   | clause: in_reduction  
   | :part:`worked on`| r308768 
  |
 
+--+--+--+---+
-| device   | omp_get_device_num()  
   | :part:`worked on`| D54342  
  |
+| device   | omp_get_device_num()  
   | :good:`done` | D54342,D128347  
  |
 
+--+--+--+---+
 | device   | structure mapping of references   
   | :none:`unclaimed`| 
  |
 
+--+--+--+---+
@@ -202,7 +202,7 @@
 
+--+--+--+---+
 | device   | clause: unified_address   
   | :part:`partial`  | 
  |
 
+--+--+--+---+
-| device   | clause: reverse_offload   
   | :none:`unclaimed parts`  | D52780  
  |
+| device   | clause: reverse_offload   
   | :part:`partial`  | D52780,D155003  
  |
 
+--+--+--+---+
 | device   | clause: atomic_default_mem_order  
   | :good:`done` | D53513  
  |
 
+--+--+--+---+
@@ -287,7 +287,7 @@
 
+--+--+--+---+
 | device   | omp_get_mapped_ptr routine
   | :good:`done` | D141545 
  |
 
+--+--+--+---+
-| device   | new async target memory copy routines 
   | :none:`unclaimed`| 
  |
+| device   | new async target memory copy routines 
   | :good:`done` | D136103 
  |
 
+--+--+--+---+
 | 

[PATCH] D155727: [OpenMP][Docs] Add some things to the OpenMP support

2023-07-19 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, JonChesterfield, kevinsala, 
jplehr.
Herald added subscribers: sunshaoce, guansong, yaxunl.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This patch adds some information that we have support for in the OpenMP
clang support page.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155727

Files:
  clang/docs/OpenMPSupport.rst


Index: clang/docs/OpenMPSupport.rst
===
--- clang/docs/OpenMPSupport.rst
+++ clang/docs/OpenMPSupport.rst
@@ -186,7 +186,7 @@
 
+--+--+--+---+
 | device   | clause: in_reduction  
   | :part:`worked on`| r308768 
  |
 
+--+--+--+---+
-| device   | omp_get_device_num()  
   | :part:`worked on`| D54342  
  |
+| device   | omp_get_device_num()  
   | :good:`done` | D54342,D128347  
  |
 
+--+--+--+---+
 | device   | structure mapping of references   
   | :none:`unclaimed`| 
  |
 
+--+--+--+---+
@@ -202,7 +202,7 @@
 
+--+--+--+---+
 | device   | clause: unified_address   
   | :part:`partial`  | 
  |
 
+--+--+--+---+
-| device   | clause: reverse_offload   
   | :none:`unclaimed parts`  | D52780  
  |
+| device   | clause: reverse_offload   
   | :part:`partial`  | D52780,D155003  
  |
 
+--+--+--+---+
 | device   | clause: atomic_default_mem_order  
   | :good:`done` | D53513  
  |
 
+--+--+--+---+
@@ -287,7 +287,7 @@
 
+--+--+--+---+
 | device   | omp_get_mapped_ptr routine
   | :good:`done` | D141545 
  |
 
+--+--+--+---+
-| device   | new async target memory copy routines 
   | :none:`unclaimed`| 
  |
+| device   | new async target memory copy routines 
   | :good:`done` | D136103 
  |
 
+--+

[PATCH] D155606: [Clang] Only emit CUDA version warnings when creating the CUDA toolchain

2023-07-18 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd2ac0069a21b: [Clang] Only emit CUDA version warnings when 
creating the CUDA toolchain (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155606/new/

https://reviews.llvm.org/D155606

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/test/Driver/cuda-version-check.cu


Index: clang/test/Driver/cuda-version-check.cu
===
--- clang/test/Driver/cuda-version-check.cu
+++ clang/test/Driver/cuda-version-check.cu
@@ -73,3 +73,11 @@
 
 // UNKNOWN_VERSION: CUDA version is newer than the latest{{.*}} supported 
version
 // UNKNOWN_VERSION_CXX-NOT: unknown CUDA version
+
+// Check to make sure we do not emit these warnings for OpenMP or 
cross-compilation.
+// RUN: %clang --target=x86_64-linux -v -### -fopenmp -nogpulib 
--offload-arch=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s 
| \
+// RUN:FileCheck %s --check-prefix=VERSION
+// RUN: %clang --target=nvptx64-nvidia-cuda -v -### -nogpulib -march=sm_60 
--cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// VERSION-NOT: CUDA version is newer than the latest{{.*}} supported version
+
Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {


Index: clang/test/Driver/cuda-version-check.cu
===
--- clang/test/Driver/cuda-version-check.cu
+++ clang/test/Driver/cuda-version-check.cu
@@ -73,3 +73,11 @@
 
 // UNKNOWN_VERSION: CUDA version is newer than the latest{{.*}} supported version
 // UNKNOWN_VERSION_CXX-NOT: unknown CUDA version
+
+// Check to make sure we do not emit these warnings for OpenMP or cross-compilation.
+// RUN: %clang --target=x86_64-linux -v -### -fopenmp -nogpulib --offload-arch=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// RUN: %clang --target=nvptx64-nvidia-cuda -v -### -nogpulib -march=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// VERSION-NOT: CUDA version is newer than the latest{{.*}} supported version
+
Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallatio

[PATCH] D155606: [Clang] Only emit CUDA version warnings when creating the CUDA toolchain

2023-07-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 541594.
jhuber6 added a comment.

Add test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155606/new/

https://reviews.llvm.org/D155606

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp
  clang/test/Driver/cuda-version-check.cu


Index: clang/test/Driver/cuda-version-check.cu
===
--- clang/test/Driver/cuda-version-check.cu
+++ clang/test/Driver/cuda-version-check.cu
@@ -73,3 +73,11 @@
 
 // UNKNOWN_VERSION: CUDA version is newer than the latest{{.*}} supported 
version
 // UNKNOWN_VERSION_CXX-NOT: unknown CUDA version
+
+// Check to make sure we do not emit these warnings for OpenMP or 
cross-compilation.
+// RUN: %clang --target=x86_64-linux -v -### -fopenmp -nogpulib 
--offload-arch=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s 
| \
+// RUN:FileCheck %s --check-prefix=VERSION
+// RUN: %clang --target=nvptx64-nvidia-cuda -v -### -nogpulib -march=sm_60 
--cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// VERSION-NOT: CUDA version is newer than the latest{{.*}} supported version
+
Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {


Index: clang/test/Driver/cuda-version-check.cu
===
--- clang/test/Driver/cuda-version-check.cu
+++ clang/test/Driver/cuda-version-check.cu
@@ -73,3 +73,11 @@
 
 // UNKNOWN_VERSION: CUDA version is newer than the latest{{.*}} supported version
 // UNKNOWN_VERSION_CXX-NOT: unknown CUDA version
+
+// Check to make sure we do not emit these warnings for OpenMP or cross-compilation.
+// RUN: %clang --target=x86_64-linux -v -### -fopenmp -nogpulib --offload-arch=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// RUN: %clang --target=nvptx64-nvidia-cuda -v -### -nogpulib -march=sm_60 --cuda-path=%S/Inputs/CUDA-new/usr/local/cuda 2>&1 -x c %s | \
+// RUN:FileCheck %s --check-prefix=VERSION
+// VERSION-NOT: CUDA version is newer than the latest{{.*}} supported version
+
Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {

[PATCH] D155606: [Clang] Only emit CUDA version warnings when creating the CUDA toolchain

2023-07-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 541562.
jhuber6 added a comment.

Fix wrong comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155606/new/

https://reviews.llvm.org/D155606

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp


Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {


Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the detected CUDA version is too new.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155606: [Clang] Only emit CUDA version warnings when creating the CUDA toolchain

2023-07-18 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: tra, yaxunl, jdoerfert, tianshilei1992, 
JonChesterfield.
Herald added a subscriber: mattd.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, wangpc, jplehr, sstefan1, MaskRay.
Herald added a project: clang.

This warning primarily applies to users of the CUDA langues as there may
be new features we rely on. The other two users of the toolchain are
OpenMP via `-fopenmp --offload-arch=sm_70` and a cross-compiled build
via `--target=nvptx64-nvida-cuda -march=sm_70`. Both of these do not
rely directly on things that would change significantly between CUDA
versions, and the way they are built can sometims make this warning
print many times.

This patch changees the behaiour to only check for the version when
building for CUDA offloading specifically, the other two will not have
this check.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155606

Files:
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Cuda.cpp


Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the CUDA installation is too old.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {


Index: clang/lib/Driver/ToolChains/Cuda.cpp
===
--- clang/lib/Driver/ToolChains/Cuda.cpp
+++ clang/lib/Driver/ToolChains/Cuda.cpp
@@ -704,10 +704,8 @@
const ArgList &Args, bool Freestanding = false)
 : ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args),
   Freestanding(Freestanding) {
-  if (CudaInstallation.isValid()) {
-CudaInstallation.WarnIfUnsupportedVersion();
+  if (CudaInstallation.isValid())
 getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
-  }
   // Lookup binaries into the driver directory, this is used to
   // discover the 'nvptx-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -810,6 +810,12 @@
 if (!CudaTC) {
   CudaTC = std::make_unique(
   *this, *CudaTriple, *HostTC, C.getInputArgs());
+
+  // Emit a warning if the CUDA installation is too old.
+  CudaInstallationDetector &CudaInstallation =
+  static_cast(*CudaTC).CudaInstallation;
+  if (CudaInstallation.isValid())
+CudaInstallation.WarnIfUnsupportedVersion();
 }
 C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
   } else if (IsHIP) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155211: [NVPTX] Add initial support for '.alias' in PTX

2023-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 540062.
jhuber6 added a comment.

Remove changes in `clang` that I forgot to remove to keep this restricted to 
the codegen. Afterwards we can remove the sema in clang and test it there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155211/new/

https://reviews.llvm.org/D155211

Files:
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
  llvm/test/CodeGen/NVPTX/alias-errors.ll
  llvm/test/CodeGen/NVPTX/alias.ll

Index: llvm/test/CodeGen/NVPTX/alias.ll
===
--- llvm/test/CodeGen/NVPTX/alias.ll
+++ llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,7 +1,27 @@
-; RUN: not --crash llc < %s -march=nvptx -mcpu=sm_20 2>&1 | FileCheck %s
-
-; Check that llc dies gracefully when given an alias.
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 | FileCheck %s
 
 define i32 @a() { ret i32 0 }
-; CHECK: ERROR: Module has aliases
 @b = internal alias i32 (), ptr @a
+@c = internal alias i32 (), ptr @a
+
+define void @foo(i32 %0, ptr %1) { ret void }
+@bar = alias i32 (), ptr @foo
+
+; CHECK: .visible .func  (.param .b32 func_retval0) a()
+
+;  CHECK: .visible .func foo(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: )
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) b();
+; CHECK-NEXT: .alias b, a;
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) c();
+; CHECK-NEXT: .alias c, a;
+
+;  CHECK: .visible .func bar(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: );
+; CHECK-NEXT: .alias bar, foo;
Index: llvm/test/CodeGen/NVPTX/alias-errors.ll
===
--- /dev/null
+++ llvm/test/CodeGen/NVPTX/alias-errors.ll
@@ -0,0 +1,9 @@
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx43 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_20 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ALIAS
+
+; ATTR: .alias requires PTX version >= 6.3 and sm_30
+
+; ALIAS: NVPTX aliasee must be a non-kernel function
+@a = global i32 42, align 8
+@b = internal alias i32, ptr @a
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
@@ -174,6 +174,7 @@
   void printModuleLevelGV(const GlobalVariable *GVar, raw_ostream &O,
   bool processDemoted, const NVPTXSubtarget &STI);
   void emitGlobals(const Module &M);
+  void emitGlobalAlias(const Module &M, const GlobalAlias &GA);
   void emitHeader(Module &M, raw_ostream &O, const NVPTXSubtarget &STI);
   void emitKernelFunctionDirectives(const Function &F, raw_ostream &O) const;
   void emitVirtualRegister(unsigned int vr, raw_ostream &);
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -473,6 +473,7 @@
   CurrentFnSym->print(O, MAI);
 
   emitFunctionParamList(F, O);
+  O << "\n";
 
   if (isKernelFunction(*F))
 emitKernelFunctionDirectives(*F, O);
@@ -623,6 +624,7 @@
   getSymbol(F)->print(O, MAI);
   O << "\n";
   emitFunctionParamList(F, O);
+  O << "\n";
   if (shouldEmitPTXNoReturn(F, TM))
 O << ".noreturn";
   O << ";\n";
@@ -790,10 +792,12 @@
 }
 
 bool NVPTXAsmPrinter::doInitialization(Module &M) {
-  if (M.alias_size()) {
-report_fatal_error("Module has aliases, which NVPTX does not support.");
-return true; // error
-  }
+  const NVPTXTargetMachine &NTM = static_cast(TM);
+  const NVPTXSubtarget &STI =
+  *static_cast(NTM.getSubtargetImpl());
+  if (M.alias_size() && (STI.getPTXVersion() < 63 || STI.getSmVersion() < 30))
+report_fatal_error(".alias requires PTX version >= 6.3 and sm_30");
+
   if (!isEmptyXXStructor(M.getNamedGlobal("llvm.global_ctors")) &&
   !LowerCtorDtor) {
 report_fatal_error(
@@ -850,6 +854,32 @@
   OutStreamer->emitRawText(OS2.str());
 }
 
+void NVPTXAsmPrinter::emitGlobalAlias(const Module &M, const GlobalAlias &GA) {
+  SmallString<128> Str;
+  raw_svector_ostream OS(Str);
+
+  MCSymbol *Name = getSymbol(&GA);
+  const Function *F = dyn_cast(GA.getAliasee());
+  if (!F || isKernelFunction(*F))
+report_fatal_error("NVPTX aliasee must be a non-kernel function");
+
+  if (GA.hasLinkOnceLinkage() || GA.hasWeakLinkage() ||
+  GA.hasAvailableExternallyLinkage() || GA.hasCommonLinkage())
+report_fatal_error("NVPTX aliasee must not be '.weak'");
+
+  OS << "\n";
+  emitLinkageDirective(F, OS);
+  OS << ".func ";
+  printReturnValStr(F, OS);
+  OS << N

[PATCH] D155211: [NVPTX] Add initial support for '.alias' in PTX

2023-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: tra, arsenm, jlebar, kushanam.
Herald added subscribers: mattd, gchakrabarti, asavonic, jeroen.dobbelaere, 
hiraditya.
Herald added a reviewer: aaron.ballman.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wangpc, wdng, jholewinski.
Herald added projects: clang, LLVM.

This patch adds initial support for using aliases when targeting PTX. We
perform a pretty strict conversion from the globals referenced to the
expected output.

These cannot currently be used due to a bug in the `nvlink`
implementation that causes aliases to pruned functions to crash the
linker.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155211

Files:
  clang/lib/Sema/SemaDeclAttr.cpp
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
  llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
  llvm/test/CodeGen/NVPTX/alias-errors.ll
  llvm/test/CodeGen/NVPTX/alias.ll

Index: llvm/test/CodeGen/NVPTX/alias.ll
===
--- llvm/test/CodeGen/NVPTX/alias.ll
+++ llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,7 +1,27 @@
-; RUN: not --crash llc < %s -march=nvptx -mcpu=sm_20 2>&1 | FileCheck %s
-
-; Check that llc dies gracefully when given an alias.
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 | FileCheck %s
 
 define i32 @a() { ret i32 0 }
-; CHECK: ERROR: Module has aliases
 @b = internal alias i32 (), ptr @a
+@c = internal alias i32 (), ptr @a
+
+define void @foo(i32 %0, ptr %1) { ret void }
+@bar = alias i32 (), ptr @foo
+
+; CHECK: .visible .func  (.param .b32 func_retval0) a()
+
+;  CHECK: .visible .func foo(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: )
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) b();
+; CHECK-NEXT: .alias b, a;
+
+;  CHECK: .visible .func  (.param .b32 func_retval0) c();
+; CHECK-NEXT: .alias c, a;
+
+;  CHECK: .visible .func bar(
+; CHECK-NEXT: .param .b32 foo_param_0,
+; CHECK-NEXT: .param .b64 foo_param_1
+; CHECK-NEXT: );
+; CHECK-NEXT: .alias bar, foo;
Index: llvm/test/CodeGen/NVPTX/alias-errors.ll
===
--- /dev/null
+++ llvm/test/CodeGen/NVPTX/alias-errors.ll
@@ -0,0 +1,9 @@
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx43 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_20 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ATTR
+; RUN: not --crash llc < %s -march=nvptx64 -mcpu=sm_30 -mattr=+ptx63 2>&1 | FileCheck %s --check-prefix=ALIAS
+
+; ATTR: .alias requires PTX version >= 6.3 and sm_30
+
+; ALIAS: NVPTX aliasee must be a non-kernel function
+@a = global i32 42, align 8
+@b = internal alias i32, ptr @a
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
@@ -174,6 +174,7 @@
   void printModuleLevelGV(const GlobalVariable *GVar, raw_ostream &O,
   bool processDemoted, const NVPTXSubtarget &STI);
   void emitGlobals(const Module &M);
+  void emitGlobalAlias(const Module &M, const GlobalAlias &GA);
   void emitHeader(Module &M, raw_ostream &O, const NVPTXSubtarget &STI);
   void emitKernelFunctionDirectives(const Function &F, raw_ostream &O) const;
   void emitVirtualRegister(unsigned int vr, raw_ostream &);
Index: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
===
--- llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -473,6 +473,7 @@
   CurrentFnSym->print(O, MAI);
 
   emitFunctionParamList(F, O);
+  O << "\n";
 
   if (isKernelFunction(*F))
 emitKernelFunctionDirectives(*F, O);
@@ -623,6 +624,7 @@
   getSymbol(F)->print(O, MAI);
   O << "\n";
   emitFunctionParamList(F, O);
+  O << "\n";
   if (shouldEmitPTXNoReturn(F, TM))
 O << ".noreturn";
   O << ";\n";
@@ -790,10 +792,12 @@
 }
 
 bool NVPTXAsmPrinter::doInitialization(Module &M) {
-  if (M.alias_size()) {
-report_fatal_error("Module has aliases, which NVPTX does not support.");
-return true; // error
-  }
+  const NVPTXTargetMachine &NTM = static_cast(TM);
+  const NVPTXSubtarget &STI =
+  *static_cast(NTM.getSubtargetImpl());
+  if (M.alias_size() && (STI.getPTXVersion() < 63 || STI.getSmVersion() < 30))
+report_fatal_error(".alias requires PTX version >= 6.3 and sm_30");
+
   if (!isEmptyXXStructor(M.getNamedGlobal("llvm.global_ctors")) &&
   !LowerCtorDtor) {
 report_fatal_error(
@@ -850,6 +854,32 @@
   OutStreamer->emitRawText(OS2.str());
 }
 
+void NVPTXAsmPrinter::emitGlobalAlias(const Module &M, const GlobalAlias &GA) {
+  SmallString<128> Str;
+  raw_svector_ostream OS(Str);
+
+  MCSymbol *Name = getSymbol(&GA);
+  const F

[PATCH] D154850: [libc] Remove GPU string functions incompatible with C++

2023-07-10 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGb454e7aa7ceb: [libc] Remove GPU string functions 
incompatible with C++ (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154850/new/

https://reviews.llvm.org/D154850

Files:
  clang/lib/Headers/llvm_libc_wrappers/string.h
  libc/config/gpu/entrypoints.txt
  libc/docs/gpu/support.rst


Index: libc/docs/gpu/support.rst
===
--- libc/docs/gpu/support.rst
+++ libc/docs/gpu/support.rst
@@ -47,7 +47,7 @@
 bcmp   |check|
 bzero  |check|
 memccpy|check|
-memchr |check|
+memchr 
 memcmp |check|
 memcpy |check|
 memmove|check|
@@ -57,7 +57,7 @@
 stpcpy |check|
 stpncpy|check|
 strcat |check|
-strchr |check|
+strchr 
 strcmp |check|
 strcpy |check|
 strcspn|check|
@@ -68,10 +68,10 @@
 strncmp|check|
 strncpy|check|
 strnlen|check|
-strpbrk|check|
-strrchr|check|
+strpbrk
+strrchr
 strspn |check|
-strstr |check|
+strstr 
 strtok |check|
 strtok_r   |check|
 strdup
Index: libc/config/gpu/entrypoints.txt
===
--- libc/config/gpu/entrypoints.txt
+++ libc/config/gpu/entrypoints.txt
@@ -21,7 +21,6 @@
 libc.src.string.bcmp
 libc.src.string.bzero
 libc.src.string.memccpy
-libc.src.string.memchr
 libc.src.string.memcmp
 libc.src.string.memcpy
 libc.src.string.memmem
@@ -32,10 +31,7 @@
 libc.src.string.stpcpy
 libc.src.string.stpncpy
 libc.src.string.strcasecmp
-libc.src.string.strcasestr
 libc.src.string.strcat
-libc.src.string.strchr
-libc.src.string.strchrnul
 libc.src.string.strcmp
 libc.src.string.strcpy
 libc.src.string.strcspn
@@ -47,10 +43,7 @@
 libc.src.string.strncmp
 libc.src.string.strncpy
 libc.src.string.strnlen
-libc.src.string.strpbrk
-libc.src.string.strrchr
 libc.src.string.strspn
-libc.src.string.strstr
 libc.src.string.strtok
 libc.src.string.strtok_r
 
Index: clang/lib/Headers/llvm_libc_wrappers/string.h
===
--- clang/lib/Headers/llvm_libc_wrappers/string.h
+++ clang/lib/Headers/llvm_libc_wrappers/string.h
@@ -13,22 +13,11 @@
 #error "This file is for GPU offloading compilation only"
 #endif
 
-// The GNU headers provide non C-standard headers when in C++ mode. Manually
-// undefine it here so that the definitions agree with the C standard for our
-// purposes.
-#ifdef __cplusplus
-extern "C" {
-#pragma push_macro("__cplusplus")
-#undef __cplusplus
-#endif
-
+// FIXME: The GNU headers provide C++ standard compliant headers when in C++
+// mode and the LLVM libc does not. We cannot enable memchr, strchr, strchrnul,
+// strpbrk, strrchr, strstr, or strcasestr until this is addressed.
 #include_next 
 
-#pragma pop_macro("__cplusplus")
-#ifdef __cplusplus
-}
-#endif
-
 #if __has_include()
 
 #if defined(__HIP__) || defined(__CUDA__)


Index: libc/docs/gpu/support.rst
===
--- libc/docs/gpu/support.rst
+++ libc/docs/gpu/support.rst
@@ -47,7 +47,7 @@
 bcmp   |check|
 bzero  |check|
 memccpy|check|
-memchr |check|
+memchr 
 memcmp |check|
 memcpy |check|
 memmove|check|
@@ -57,7 +57,7 @@
 stpcpy |check|
 stpncpy|check|
 strcat |check|
-strchr |check|
+strchr 
 strcmp |check|
 strcpy |check|
 strcspn|check|
@@ -68,10 +68,10 @@
 strncmp|check|
 strncpy|check|
 strnlen|check|
-strpbrk|check|
-strrchr|check|
+strpbrk
+strrchr
 strspn |check|
-strstr |check|
+strstr 
 strtok |check|
 strtok_r   |check|
 strdup
Index: libc/config/gpu/entrypoints.txt
===
--- libc/config/gpu/entrypoints.txt
+++ libc/config/gpu/entrypoints.txt
@@ -21,7 +21,6 @@
 libc.src.string.bcmp
 libc.src.string.bzero
 libc.src.string.memccpy
-libc.src.string.memchr
 libc.src.string.memcmp
 libc.src.string.memcpy
 libc.src.string.memmem
@@ -32,10 +31,7 @@
 libc.src.string.stpcpy
 libc.src.string.stpncpy
 libc.src.string.strcasecmp
-libc.src.string.strcasestr
 libc.src.string.strcat
-libc.src.string.strchr
-libc.src.string.strchrnul
 libc.src.string.strcmp
 libc.src.string.strcpy
 libc.src.string.strcspn
@@ -47,10 +43,7 @@
 libc.src.string.strncmp
 libc.src.string.strncpy
 libc.src.string.strnlen
-libc.src.stri

[PATCH] D154850: [libc] Remove GPU string functions incompatible with C++

2023-07-10 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, JonChesterfield, sivachandra, lntue, 
michaelrj, ronlieb.
Herald added projects: libc-project, All.
Herald added a subscriber: libc-commits.
jhuber6 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

These functions have definitions differing between C and C++. GNU
respects the C++ definitions while the LLVM libc does not. This causes
many bugs and the current hack creates other issues. Rather than hack
around this I'd rather temporarily disable these than regress with the
integration into other offloading languages. We lose test support for
them but we should be able to re-enable these once the `libc` headers
provide these correctly.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D154850

Files:
  clang/lib/Headers/llvm_libc_wrappers/string.h
  libc/config/gpu/entrypoints.txt
  libc/docs/gpu/support.rst


Index: libc/docs/gpu/support.rst
===
--- libc/docs/gpu/support.rst
+++ libc/docs/gpu/support.rst
@@ -47,7 +47,7 @@
 bcmp   |check|
 bzero  |check|
 memccpy|check|
-memchr |check|
+memchr 
 memcmp |check|
 memcpy |check|
 memmove|check|
@@ -57,7 +57,7 @@
 stpcpy |check|
 stpncpy|check|
 strcat |check|
-strchr |check|
+strchr 
 strcmp |check|
 strcpy |check|
 strcspn|check|
@@ -68,10 +68,10 @@
 strncmp|check|
 strncpy|check|
 strnlen|check|
-strpbrk|check|
-strrchr|check|
+strpbrk
+strrchr
 strspn |check|
-strstr |check|
+strstr 
 strtok |check|
 strtok_r   |check|
 strdup
Index: libc/config/gpu/entrypoints.txt
===
--- libc/config/gpu/entrypoints.txt
+++ libc/config/gpu/entrypoints.txt
@@ -21,7 +21,6 @@
 libc.src.string.bcmp
 libc.src.string.bzero
 libc.src.string.memccpy
-libc.src.string.memchr
 libc.src.string.memcmp
 libc.src.string.memcpy
 libc.src.string.memmem
@@ -32,10 +31,7 @@
 libc.src.string.stpcpy
 libc.src.string.stpncpy
 libc.src.string.strcasecmp
-libc.src.string.strcasestr
 libc.src.string.strcat
-libc.src.string.strchr
-libc.src.string.strchrnul
 libc.src.string.strcmp
 libc.src.string.strcpy
 libc.src.string.strcspn
@@ -47,10 +43,7 @@
 libc.src.string.strncmp
 libc.src.string.strncpy
 libc.src.string.strnlen
-libc.src.string.strpbrk
-libc.src.string.strrchr
 libc.src.string.strspn
-libc.src.string.strstr
 libc.src.string.strtok
 libc.src.string.strtok_r
 
Index: clang/lib/Headers/llvm_libc_wrappers/string.h
===
--- clang/lib/Headers/llvm_libc_wrappers/string.h
+++ clang/lib/Headers/llvm_libc_wrappers/string.h
@@ -13,22 +13,11 @@
 #error "This file is for GPU offloading compilation only"
 #endif
 
-// The GNU headers provide non C-standard headers when in C++ mode. Manually
-// undefine it here so that the definitions agree with the C standard for our
-// purposes.
-#ifdef __cplusplus
-extern "C" {
-#pragma push_macro("__cplusplus")
-#undef __cplusplus
-#endif
-
+// FIXME: The GNU headers provide C++ standard compliant headers when in C++
+// mode and the LLVM libc does not. We cannot enable memchr, strchr, strchrnul,
+// strpbrk, strrchr, strstr, or strcasestr until this is addressed.
 #include_next 
 
-#pragma pop_macro("__cplusplus")
-#ifdef __cplusplus
-}
-#endif
-
 #if __has_include()
 
 #if defined(__HIP__) || defined(__CUDA__)


Index: libc/docs/gpu/support.rst
===
--- libc/docs/gpu/support.rst
+++ libc/docs/gpu/support.rst
@@ -47,7 +47,7 @@
 bcmp   |check|
 bzero  |check|
 memccpy|check|
-memchr |check|
+memchr 
 memcmp |check|
 memcpy |check|
 memmove|check|
@@ -57,7 +57,7 @@
 stpcpy |check|
 stpncpy|check|
 strcat |check|
-strchr |check|
+strchr 
 strcmp |check|
 strcpy |check|
 strcspn|check|
@@ -68,10 +68,10 @@
 strncmp|check|
 strncpy|check|
 strnlen|check|
-strpbrk|check|
-strrchr|check|
+strpbrk
+strrchr
 strspn |check|
-strstr |check|
+strstr 
 strtok |check|
 strtok_r   |check|
 strdup
Index: libc/config/gpu/entrypoints.txt
===
--- libc/config/gpu/entrypoints.txt
+++ libc/config/gpu/entrypoints.txt
@@ -21,7 +21,6 @@
 libc.src.string.bcmp
 libc.src.string.bzero
 libc.src.string.memccpy
-libc.src.string.memchr
 libc.src.string.memcmp
 libc.src.string.memcpy
 lib

[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D153725#4485039 , @arsenm wrote:

> And the libomptarget build is in fact doing that, but it shouldn't have to. 
> What it's doing actually seems really unreasonable. It's only building the 
> locally found targets when it should be building all targetable devices. The 
> inconvenience there is that's too many devices, so as a build time hack you 
> should be able to opt-in to a restricted subset. Even better would be if we 
> would only build a copy for a reasonable subset of targets (i.e. one per 
> generation where there's actually some semblance of compatibility). Or could 
> just capitulate and rely on the hacks device libs does

The `libomptarget` build uses it to determine if it should build the tests 
mostly, we don't want to configure tests for a system that cannot support them. 
The `libc` tests however requires it to set the architecture for its test 
configuration since we can't support multiple test architectures at the same 
time, it required too much work so I shelved that. We more or less just say "If 
you've got HSA / CUDA we expect to run tests".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-10 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D153725#4484966 , @arsenm wrote:

> In D153725#4484754 , 
> @JonChesterfield wrote:
>
>> - if you open the driver too many times at once it fails to open, so running 
>> a parallel build that uses this tool doesn't work on fast machines
>
> Why would this happen? Seems like a bug to fix?

Jon is probably referring to a recurring problem we've noticed with the `libc` 
tests on HSA that they will sometimes fail when running with multiple threads, 
see 
https://lab.llvm.org/staging/#/builders/247/builds/2599/steps/10/logs/stdio. 
Haven't been able to track down whether or not that's a bug in the 
implementation or interface somewhere.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154591: [OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags

2023-07-10 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Yeah that happens sometimes, if it's working on your system it's safe to assume 
it's fine. Worst case scenario you can always revert, it's not a big deal.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154591/new/

https://reviews.llvm.org/D154591

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

LG


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:48-52
+  if (!printGPUsByHSA())
+return 0;
+#endif
 
+  return printGPUsByHSA();

Are we missing something here ? They look the same.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-07-07 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:50
+#else
+  return printGPUsByHSA();
+#endif

arsenm wrote:
> The HIP path should work on linux too. I generally think we should build as 
> much code as possible on all hosts, so how about
> ```
> #ifndef _WIN32
>   if (tryHSA())
> return 0;
> #endif
> 
> tryHIP()
> ```
> 
> 
> 
That'd be fine, I'm in favor of sticking to HSA since it's a smaller runtime 
that's more reasonable to build standalone without the whole ROCm stack.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154036: [libc] Add support for creating wrapper headers for offloading in clang

2023-07-06 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa4a26374aa11: [libc] Add support for creating wrapper 
headers for offloading in clang (authored by jhuber6).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154036/new/

https://reviews.llvm.org/D154036

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Headers/CMakeLists.txt
  clang/lib/Headers/llvm_libc_wrappers/ctype.h
  clang/lib/Headers/llvm_libc_wrappers/inttypes.h
  clang/lib/Headers/llvm_libc_wrappers/llvm-libc-decls/README.txt
  clang/lib/Headers/llvm_libc_wrappers/stdio.h
  clang/lib/Headers/llvm_libc_wrappers/stdlib.h
  clang/lib/Headers/llvm_libc_wrappers/string.h
  clang/test/Driver/gpu-libc-headers.c
  libc/cmake/modules/LLVMLibCHeaderRules.cmake
  libc/include/CMakeLists.txt
  libc/utils/HdrGen/Generator.cpp
  libc/utils/HdrGen/Generator.h
  libc/utils/HdrGen/Main.cpp

Index: libc/utils/HdrGen/Main.cpp
===
--- libc/utils/HdrGen/Main.cpp
+++ libc/utils/HdrGen/Main.cpp
@@ -32,6 +32,9 @@
 llvm::cl::list ReplacementValues(
 "args", llvm::cl::desc("Command separated = pairs."),
 llvm::cl::value_desc("[,name=value]"));
+llvm::cl::opt ExportDecls(
+"export-decls",
+llvm::cl::desc("Output a new header containing only the entrypoints."));
 
 void ParseArgValuePairs(std::unordered_map &Map) {
   for (std::string &R : ReplacementValues) {
@@ -48,7 +51,10 @@
   std::unordered_map ArgMap;
   ParseArgValuePairs(ArgMap);
   Generator G(HeaderDefFile, EntrypointNamesOption, StandardHeader, ArgMap);
-  G.generate(OS, Records);
+  if (ExportDecls)
+G.generateDecls(OS, Records);
+  else
+G.generate(OS, Records);
 
   return false;
 }
Index: libc/utils/HdrGen/Generator.h
===
--- libc/utils/HdrGen/Generator.h
+++ libc/utils/HdrGen/Generator.h
@@ -52,6 +52,7 @@
 ArgMap(Map) {}
 
   void generate(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
+  void generateDecls(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
 };
 
 } // namespace llvm_libc
Index: libc/utils/HdrGen/Generator.cpp
===
--- libc/utils/HdrGen/Generator.cpp
+++ libc/utils/HdrGen/Generator.cpp
@@ -10,6 +10,7 @@
 
 #include "IncludeFileCommand.h"
 #include "PublicAPICommand.h"
+#include "utils/LibcTableGenUtil/APIIndexer.h"
 
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -116,4 +117,78 @@
   }
 }
 
+void Generator::generateDecls(llvm::raw_ostream &OS,
+  llvm::RecordKeeper &Records) {
+
+  OS << "//===-- C standard declarations for " << StdHeader << " "
+ << std::string(80 - (42 + StdHeader.size()), '-') << "===//\n"
+ << "//\n"
+ << "// Part of the LLVM Project, under the Apache License v2.0 with LLVM "
+"Exceptions.\n"
+ << "// See https://llvm.org/LICENSE.txt for license information.\n"
+ << "// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception\n"
+ << "//\n"
+ << "//"
+"===---"
+"---===//\n\n";
+
+  std::string HeaderGuard(StdHeader.size(), '\0');
+  llvm::transform(StdHeader, HeaderGuard.begin(), [](const char C) -> char {
+return !isalnum(C) ? '_' : llvm::toUpper(C);
+  });
+  OS << "#ifndef __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n"
+ << "#define __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n\n";
+
+  OS << "#ifndef __LIBC_ATTRS\n"
+ << "#define __LIBC_ATTRS\n"
+ << "#endif\n\n";
+
+  OS << "#ifdef __cplusplus\n"
+ << "extern \"C\" {\n"
+ << "#endif\n\n";
+
+  APIIndexer G(StdHeader, Records);
+  for (auto &Name : EntrypointNameList) {
+// Filter out functions not exported by this header.
+if (G.FunctionSpecMap.find(Name) == G.FunctionSpecMap.end())
+  continue;
+
+llvm::Record *FunctionSpec = G.FunctionSpecMap[Name];
+llvm::Record *RetValSpec = FunctionSpec->getValueAsDef("Return");
+llvm::Record *ReturnType = RetValSpec->getValueAsDef("ReturnType");
+
+OS << G.getTypeAsString(ReturnType) << " " << Name << "(";
+
+auto ArgsList = FunctionSpec->getValueAsListOfDefs("Args");
+for (size_t i = 0; i < ArgsList.size(); ++i) {
+  llvm::Record *ArgType = ArgsList[i]->getValueAsDef("ArgType");
+  OS << G.getTypeAsString(ArgType);
+  if (i < ArgsList.size() - 1)
+OS << ", ";
+}
+
+OS << ") __LIBC_ATTRS;\n\n";
+  }
+
+  // Make another pass over entrypoints to emit object declarations.
+  for (const auto &Name : EntrypointNameList) {
+if (G.ObjectSpecMap.find(Name) == G.ObjectSpecMap.end())
+  continue;
+llvm::Record *ObjectSpec = G.ObjectSpecMap[Name];
+auto Type = ObjectSpec->getValueAsString("Type");
+OS << "extern " << Type << " 

[PATCH] D154591: [OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags

2023-07-06 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 accepted this revision.
jhuber6 added a comment.
This revision is now accepted and ready to land.

Noisy change but shouldn't really affect much. Names area little overloaded 
here so it's good to be more specific. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154591/new/

https://reviews.llvm.org/D154591

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154036: [libc] Add support for creating wrapper headers for offloading in clang

2023-07-06 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 537787.
jhuber6 added a comment.

Changing this to only apply to OpenMP for now. It breaks CUDA / HIP builds
because they already have forward declarations of things like `malloc` or
`memcpy` on the GPU that conflict. We'll need to clean those up later.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154036/new/

https://reviews.llvm.org/D154036

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Headers/CMakeLists.txt
  clang/lib/Headers/llvm_libc_wrappers/ctype.h
  clang/lib/Headers/llvm_libc_wrappers/inttypes.h
  clang/lib/Headers/llvm_libc_wrappers/llvm-libc-decls/README.txt
  clang/lib/Headers/llvm_libc_wrappers/stdio.h
  clang/lib/Headers/llvm_libc_wrappers/stdlib.h
  clang/lib/Headers/llvm_libc_wrappers/string.h
  clang/test/Driver/gpu-libc-headers.c
  libc/cmake/modules/LLVMLibCHeaderRules.cmake
  libc/include/CMakeLists.txt
  libc/utils/HdrGen/Generator.cpp
  libc/utils/HdrGen/Generator.h
  libc/utils/HdrGen/Main.cpp

Index: libc/utils/HdrGen/Main.cpp
===
--- libc/utils/HdrGen/Main.cpp
+++ libc/utils/HdrGen/Main.cpp
@@ -32,6 +32,9 @@
 llvm::cl::list ReplacementValues(
 "args", llvm::cl::desc("Command separated = pairs."),
 llvm::cl::value_desc("[,name=value]"));
+llvm::cl::opt ExportDecls(
+"export-decls",
+llvm::cl::desc("Output a new header containing only the entrypoints."));
 
 void ParseArgValuePairs(std::unordered_map &Map) {
   for (std::string &R : ReplacementValues) {
@@ -48,7 +51,10 @@
   std::unordered_map ArgMap;
   ParseArgValuePairs(ArgMap);
   Generator G(HeaderDefFile, EntrypointNamesOption, StandardHeader, ArgMap);
-  G.generate(OS, Records);
+  if (ExportDecls)
+G.generateDecls(OS, Records);
+  else
+G.generate(OS, Records);
 
   return false;
 }
Index: libc/utils/HdrGen/Generator.h
===
--- libc/utils/HdrGen/Generator.h
+++ libc/utils/HdrGen/Generator.h
@@ -52,6 +52,7 @@
 ArgMap(Map) {}
 
   void generate(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
+  void generateDecls(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
 };
 
 } // namespace llvm_libc
Index: libc/utils/HdrGen/Generator.cpp
===
--- libc/utils/HdrGen/Generator.cpp
+++ libc/utils/HdrGen/Generator.cpp
@@ -10,6 +10,7 @@
 
 #include "IncludeFileCommand.h"
 #include "PublicAPICommand.h"
+#include "utils/LibcTableGenUtil/APIIndexer.h"
 
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -116,4 +117,78 @@
   }
 }
 
+void Generator::generateDecls(llvm::raw_ostream &OS,
+  llvm::RecordKeeper &Records) {
+
+  OS << "//===-- C standard declarations for " << StdHeader << " "
+ << std::string(80 - (42 + StdHeader.size()), '-') << "===//\n"
+ << "//\n"
+ << "// Part of the LLVM Project, under the Apache License v2.0 with LLVM "
+"Exceptions.\n"
+ << "// See https://llvm.org/LICENSE.txt for license information.\n"
+ << "// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception\n"
+ << "//\n"
+ << "//"
+"===---"
+"---===//\n\n";
+
+  std::string HeaderGuard(StdHeader.size(), '\0');
+  llvm::transform(StdHeader, HeaderGuard.begin(), [](const char C) -> char {
+return !isalnum(C) ? '_' : llvm::toUpper(C);
+  });
+  OS << "#ifndef __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n"
+ << "#define __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n\n";
+
+  OS << "#ifndef __LIBC_ATTRS\n"
+ << "#define __LIBC_ATTRS\n"
+ << "#endif\n\n";
+
+  OS << "#ifdef __cplusplus\n"
+ << "extern \"C\" {\n"
+ << "#endif\n\n";
+
+  APIIndexer G(StdHeader, Records);
+  for (auto &Name : EntrypointNameList) {
+// Filter out functions not exported by this header.
+if (G.FunctionSpecMap.find(Name) == G.FunctionSpecMap.end())
+  continue;
+
+llvm::Record *FunctionSpec = G.FunctionSpecMap[Name];
+llvm::Record *RetValSpec = FunctionSpec->getValueAsDef("Return");
+llvm::Record *ReturnType = RetValSpec->getValueAsDef("ReturnType");
+
+OS << G.getTypeAsString(ReturnType) << " " << Name << "(";
+
+auto ArgsList = FunctionSpec->getValueAsListOfDefs("Args");
+for (size_t i = 0; i < ArgsList.size(); ++i) {
+  llvm::Record *ArgType = ArgsList[i]->getValueAsDef("ArgType");
+  OS << G.getTypeAsString(ArgType);
+  if (i < ArgsList.size() - 1)
+OS << ", ";
+}
+
+OS << ") __LIBC_ATTRS;\n\n";
+  }
+
+  // Make another pass over entrypoints to emit object declarations.
+  for (const auto &Name : EntrypointNameList) {
+if (G.ObjectSpecMap.find(Name) == G.ObjectSpecMap.end())
+  continue;
+llvm::Record *ObjectSpec = G.ObjectSpecMap[Name];
+auto Type = ObjectSpec->getValueAsString("Type");

[PATCH] D154036: [libc] Add support for creating wrapper headers for offloading in clang

2023-07-06 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: libc/include/CMakeLists.txt:8
+if(LIBC_TARGET_ARCHITECTURE_IS_GPU)
+  include(GetClangResourceDir)
+endif()

sivachandra wrote:
> Where does this come from?
It's a global CMake module that LLVM provides in 
`cmake/Modules/GetClangResourceDir.cmake`. I only expect the GPU build to be 
done in-tree.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154036/new/

https://reviews.llvm.org/D154036

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154036: [libc] Add support for creating wrapper headers for offloading in clang

2023-07-05 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 537535.
jhuber6 added a comment.

Fix guard on the headers for offloading languages


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154036/new/

https://reviews.llvm.org/D154036

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Headers/CMakeLists.txt
  clang/lib/Headers/llvm_libc_wrappers/ctype.h
  clang/lib/Headers/llvm_libc_wrappers/llvm-libc-decls/README.txt
  clang/lib/Headers/llvm_libc_wrappers/stdio.h
  clang/lib/Headers/llvm_libc_wrappers/stdlib.h
  clang/lib/Headers/llvm_libc_wrappers/string.h
  clang/test/Driver/gpu-libc-headers.c
  libc/cmake/modules/LLVMLibCHeaderRules.cmake
  libc/include/CMakeLists.txt
  libc/utils/HdrGen/Generator.cpp
  libc/utils/HdrGen/Generator.h
  libc/utils/HdrGen/Main.cpp

Index: libc/utils/HdrGen/Main.cpp
===
--- libc/utils/HdrGen/Main.cpp
+++ libc/utils/HdrGen/Main.cpp
@@ -32,6 +32,9 @@
 llvm::cl::list ReplacementValues(
 "args", llvm::cl::desc("Command separated = pairs."),
 llvm::cl::value_desc("[,name=value]"));
+llvm::cl::opt ExportDecls(
+"export-decls",
+llvm::cl::desc("Output a new header containing only the entrypoints."));
 
 void ParseArgValuePairs(std::unordered_map &Map) {
   for (std::string &R : ReplacementValues) {
@@ -48,7 +51,10 @@
   std::unordered_map ArgMap;
   ParseArgValuePairs(ArgMap);
   Generator G(HeaderDefFile, EntrypointNamesOption, StandardHeader, ArgMap);
-  G.generate(OS, Records);
+  if (ExportDecls)
+G.generateDecls(OS, Records);
+  else
+G.generate(OS, Records);
 
   return false;
 }
Index: libc/utils/HdrGen/Generator.h
===
--- libc/utils/HdrGen/Generator.h
+++ libc/utils/HdrGen/Generator.h
@@ -52,6 +52,7 @@
 ArgMap(Map) {}
 
   void generate(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
+  void generateDecls(llvm::raw_ostream &OS, llvm::RecordKeeper &Records);
 };
 
 } // namespace llvm_libc
Index: libc/utils/HdrGen/Generator.cpp
===
--- libc/utils/HdrGen/Generator.cpp
+++ libc/utils/HdrGen/Generator.cpp
@@ -10,6 +10,7 @@
 
 #include "IncludeFileCommand.h"
 #include "PublicAPICommand.h"
+#include "utils/LibcTableGenUtil/APIIndexer.h"
 
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -116,4 +117,78 @@
   }
 }
 
+void Generator::generateDecls(llvm::raw_ostream &OS,
+  llvm::RecordKeeper &Records) {
+
+  OS << "//===-- C standard declarations for " << StdHeader << " "
+ << std::string(80 - (42 + StdHeader.size()), '-') << "===//\n"
+ << "//\n"
+ << "// Part of the LLVM Project, under the Apache License v2.0 with LLVM "
+"Exceptions.\n"
+ << "// See https://llvm.org/LICENSE.txt for license information.\n"
+ << "// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception\n"
+ << "//\n"
+ << "//"
+"===---"
+"---===//\n\n";
+
+  std::string HeaderGuard(StdHeader.size(), '\0');
+  llvm::transform(StdHeader, HeaderGuard.begin(), [](const char C) -> char {
+return !isalnum(C) ? '_' : llvm::toUpper(C);
+  });
+  OS << "#ifndef __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n"
+ << "#define __LLVM_LIBC_DECLARATIONS_" << HeaderGuard << "\n\n";
+
+  OS << "#ifndef __LIBC_ATTRS\n"
+ << "#define __LIBC_ATTRS\n"
+ << "#endif\n\n";
+
+  OS << "#ifdef __cplusplus\n"
+ << "extern \"C\" {\n"
+ << "#endif\n\n";
+
+  APIIndexer G(StdHeader, Records);
+  for (auto &Name : EntrypointNameList) {
+// Filter out functions not exported by this header.
+if (G.FunctionSpecMap.find(Name) == G.FunctionSpecMap.end())
+  continue;
+
+llvm::Record *FunctionSpec = G.FunctionSpecMap[Name];
+llvm::Record *RetValSpec = FunctionSpec->getValueAsDef("Return");
+llvm::Record *ReturnType = RetValSpec->getValueAsDef("ReturnType");
+
+OS << G.getTypeAsString(ReturnType) << " " << Name << "(";
+
+auto ArgsList = FunctionSpec->getValueAsListOfDefs("Args");
+for (size_t i = 0; i < ArgsList.size(); ++i) {
+  llvm::Record *ArgType = ArgsList[i]->getValueAsDef("ArgType");
+  OS << G.getTypeAsString(ArgType);
+  if (i < ArgsList.size() - 1)
+OS << ", ";
+}
+
+OS << ") __LIBC_ATTRS;\n\n";
+  }
+
+  // Make another pass over entrypoints to emit object declarations.
+  for (const auto &Name : EntrypointNameList) {
+if (G.ObjectSpecMap.find(Name) == G.ObjectSpecMap.end())
+  continue;
+llvm::Record *ObjectSpec = G.ObjectSpecMap[Name];
+auto Type = ObjectSpec->getValueAsString("Type");
+OS << "extern " << Type << " " << Name << " __LIBC_ATTRS;\n";
+  }
+
+  // Emit a final newline if we emitted any object declarations.
+  if (llvm::any_of(EntrypointNameList, [&](const std::string &Name) {
+

[PATCH] D154378: [LinkerWrapper] Set the GPU LTO job to be freestanding

2023-07-03 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: JonChesterfield, jdoerfert, yaxunl, tianshilei1992.
Herald added a subscriber: inglorion.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

The LTO config allows us to set whether or not the build is
freestanding. This pretty much prevents emission of library calls and
should cause them to be treated like normal functions. This is in
relation to D154364 .


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D154378

Files:
  clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp


Index: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
===
--- clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -570,6 +570,10 @@
   Conf.CGFileType =
   (Triple.isNVPTX() || SaveTemps) ? CGFT_AssemblyFile : CGFT_ObjectFile;
 
+  // We consider the GPU to be a freestanding target so we shouldn't emit any
+  // builtin library calls.
+  Conf.Freestanding = true;
+
   // TODO: Handle remark files
   Conf.HasWholeProgramVisibility = Args.hasArg(OPT_whole_program);
 


Index: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
===
--- clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -570,6 +570,10 @@
   Conf.CGFileType =
   (Triple.isNVPTX() || SaveTemps) ? CGFT_AssemblyFile : CGFT_ObjectFile;
 
+  // We consider the GPU to be a freestanding target so we shouldn't emit any
+  // builtin library calls.
+  Conf.Freestanding = true;
+
   // TODO: Handle remark files
   Conf.HasWholeProgramVisibility = Args.hasArg(OPT_whole_program);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-06-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:47
 
-  // Attempt to load the HSA runtime.
-  if (llvm::Error Err = loadHSA()) {
-logAllUnhandledErrors(std::move(Err), llvm::errs());
-return 1;
-  }
-
-  hsa_status_t Status = hsa_init();
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  std::vector GPUs;
-  Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  for (const auto &GPU : GPUs)
-printf("%s\n", GPU.c_str());
-
-  if (GPUs.size() < 1)
-return 1;
-
-  hsa_shut_down();
-  return 0;
+#ifdef _WIN32
+  return printGPUsByHIP();

jhuber6 wrote:
> arsenm wrote:
> > yaxunl wrote:
> > > jhuber6 wrote:
> > > > yaxunl wrote:
> > > > > jhuber6 wrote:
> > > > > > Doesn't LLVM know if it's being built for Windows? Maybe we should 
> > > > > > key off of that instead and then conditionally `add_sources` for a 
> > > > > > single function that satisfies the same "print all the 
> > > > > > architectures" thing.
> > > > > When this code is compiled on Windows, the compiler predefines 
> > > > > `_WIN32`, so it should work.
> > > > > 
> > > > > I tried to tweak cmake files of amdgpu-arch to selectively add source 
> > > > > files for Windows and non-windows but it did not work. If you have a 
> > > > > file in that directory that is not included in any target, cmake will 
> > > > > report an error. Seems there is a mechanism in CMake files for clang 
> > > > > tools not allowing any 'dangling' source files.
> > > > The proper way to do that is to add it to a new subdirectory and 
> > > > conditionally do `add_subdirectory`. Something like
> > > > ```
> > > > HSA/GetAMDGPUArch.cpp
> > > > HIP/GetAMDGPUArch.cpp
> > > > ```
> > > > It's not a big deal, but I just feel like including unused symbols in 
> > > > the binary on Linux isn't ideal. Up to you if you want to put in the 
> > > > effort.
> > > The HIP version actually works on both Linux and Windows. I am not sure 
> > > whether one day we want to use it on Linux too since it supports target 
> > > ID features.
> > > 
> > > Also, I kind of think it is overkill to have separate directories for 
> > > Windows and Linux for this simple program.
> > Why can't you get the target id features through the HSA path? I think 
> > there's value in going through the lowest level component to get the 
> > information
> This should be what we do in the OpenMP runtime, should be able to add that 
> feature.
> ```
>   uint32_t name_len;  
>   
>   
>   err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME_LENGTH, &name_len);   
>   
>
>   if (err != HSA_STATUS_SUCCESS) {
>   
>  
> DP("Error getting ISA info length\n");
>   
>  
> return err;   
>   
>  
>   }   
>   
>  
>   
>   
>  
>   char TargetID[name_len];
>   
>  
>   err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME, TargetID);   
>   
>  
>   if (err != HSA_STATUS_SUCCESS) {
>   
>  
> DP("Error getting ISA info name\n");  
> 

[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-06-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:47
 
-  // Attempt to load the HSA runtime.
-  if (llvm::Error Err = loadHSA()) {
-logAllUnhandledErrors(std::move(Err), llvm::errs());
-return 1;
-  }
-
-  hsa_status_t Status = hsa_init();
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  std::vector GPUs;
-  Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  for (const auto &GPU : GPUs)
-printf("%s\n", GPU.c_str());
-
-  if (GPUs.size() < 1)
-return 1;
-
-  hsa_shut_down();
-  return 0;
+#ifdef _WIN32
+  return printGPUsByHIP();

arsenm wrote:
> yaxunl wrote:
> > jhuber6 wrote:
> > > yaxunl wrote:
> > > > jhuber6 wrote:
> > > > > Doesn't LLVM know if it's being built for Windows? Maybe we should 
> > > > > key off of that instead and then conditionally `add_sources` for a 
> > > > > single function that satisfies the same "print all the architectures" 
> > > > > thing.
> > > > When this code is compiled on Windows, the compiler predefines 
> > > > `_WIN32`, so it should work.
> > > > 
> > > > I tried to tweak cmake files of amdgpu-arch to selectively add source 
> > > > files for Windows and non-windows but it did not work. If you have a 
> > > > file in that directory that is not included in any target, cmake will 
> > > > report an error. Seems there is a mechanism in CMake files for clang 
> > > > tools not allowing any 'dangling' source files.
> > > The proper way to do that is to add it to a new subdirectory and 
> > > conditionally do `add_subdirectory`. Something like
> > > ```
> > > HSA/GetAMDGPUArch.cpp
> > > HIP/GetAMDGPUArch.cpp
> > > ```
> > > It's not a big deal, but I just feel like including unused symbols in the 
> > > binary on Linux isn't ideal. Up to you if you want to put in the effort.
> > The HIP version actually works on both Linux and Windows. I am not sure 
> > whether one day we want to use it on Linux too since it supports target ID 
> > features.
> > 
> > Also, I kind of think it is overkill to have separate directories for 
> > Windows and Linux for this simple program.
> Why can't you get the target id features through the HSA path? I think 
> there's value in going through the lowest level component to get the 
> information
This should be what we do in the OpenMP runtime, should be able to add that 
feature.
```
  uint32_t name_len;

  
  err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME_LENGTH, &name_len); 

   
  if (err != HSA_STATUS_SUCCESS) {  

 
DP("Error getting ISA info length\n");  

 
return err; 

 
  } 

 


 
  char TargetID[name_len];  

 
  err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME, TargetID); 

 
  if (err != HSA_STATUS_SUCCESS) {  

 
DP("Error getting ISA info name\n");

 
return err;  

[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-06-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Also w.r.t. target-id, I'm wondering what a good solution would be. Right now 
the main usage of `amdgpu-arch` is both to detect the `-mcpu / -march` in CMake 
and to fill in the architecture via `--offload-arch=native` or 
`-fopenmp-target=amdgcn-amd-amdhsa`. We may want to make a flag to specify if 
we want to include `target-id` information in the reported architectures.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-06-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:47
 
-  // Attempt to load the HSA runtime.
-  if (llvm::Error Err = loadHSA()) {
-logAllUnhandledErrors(std::move(Err), llvm::errs());
-return 1;
-  }
-
-  hsa_status_t Status = hsa_init();
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  std::vector GPUs;
-  Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  for (const auto &GPU : GPUs)
-printf("%s\n", GPU.c_str());
-
-  if (GPUs.size() < 1)
-return 1;
-
-  hsa_shut_down();
-  return 0;
+#ifdef _WIN32
+  return printGPUsByHIP();

yaxunl wrote:
> jhuber6 wrote:
> > Doesn't LLVM know if it's being built for Windows? Maybe we should key off 
> > of that instead and then conditionally `add_sources` for a single function 
> > that satisfies the same "print all the architectures" thing.
> When this code is compiled on Windows, the compiler predefines `_WIN32`, so 
> it should work.
> 
> I tried to tweak cmake files of amdgpu-arch to selectively add source files 
> for Windows and non-windows but it did not work. If you have a file in that 
> directory that is not included in any target, cmake will report an error. 
> Seems there is a mechanism in CMake files for clang tools not allowing any 
> 'dangling' source files.
The proper way to do that is to add it to a new subdirectory and conditionally 
do `add_subdirectory`. Something like
```
HSA/GetAMDGPUArch.cpp
HIP/GetAMDGPUArch.cpp
```
It's not a big deal, but I just feel like including unused symbols in the 
binary on Linux isn't ideal. Up to you if you want to put in the effort.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153725: [clang] Make amdgpu-arch tool work on Windows

2023-06-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D153725#4463589 , @arsenm wrote:

> Unrelated but can we get this to start reporting xnack and ecc?

A lot of CMake relies on this just being an ordered list of architectures, so 
we'd probably need to make that an opt-in thing.




Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:47
 
-  // Attempt to load the HSA runtime.
-  if (llvm::Error Err = loadHSA()) {
-logAllUnhandledErrors(std::move(Err), llvm::errs());
-return 1;
-  }
-
-  hsa_status_t Status = hsa_init();
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  std::vector GPUs;
-  Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
-  if (Status != HSA_STATUS_SUCCESS) {
-return 1;
-  }
-
-  for (const auto &GPU : GPUs)
-printf("%s\n", GPU.c_str());
-
-  if (GPUs.size() < 1)
-return 1;
-
-  hsa_shut_down();
-  return 0;
+#ifdef _WIN32
+  return printGPUsByHIP();

Doesn't LLVM know if it's being built for Windows? Maybe we should key off of 
that instead and then conditionally `add_sources` for a single function that 
satisfies the same "print all the architectures" thing.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153725/new/

https://reviews.llvm.org/D153725

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D154145: [HIP] Fix -mllvm option for device lld linker

2023-06-29 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added inline comments.



Comment at: clang/lib/Driver/ToolChains/HIPAMD.cpp:164
+StringRef ArgVal = Arg->getValue(1);
+if (ArgVal.startswith("-mllvm=")) {
+  ArgVal = ArgVal.substr(strlen("-mllvm="));

arsenm wrote:
> StringRef Prefix("-mllvm=") and then use the length instead of strlen
You could probably also do `ArgVal.split("-mllvm=").second` and push back first 
or second depending on whether or not `second` is empty.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154145/new/

https://reviews.llvm.org/D154145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >