[clang] 679158e - Make hip math headers easier to use from C

2020-07-24 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-07-24T20:50:46+01:00
New Revision: 679158e662aa247282b8eea4c2d60b33204171fb

URL: 
https://github.com/llvm/llvm-project/commit/679158e662aa247282b8eea4c2d60b33204171fb
DIFF: 
https://github.com/llvm/llvm-project/commit/679158e662aa247282b8eea4c2d60b33204171fb.diff

LOG: Make hip math headers easier to use from C

Summary:
Make hip math headers easier to use from C

Motivation is a step towards using the hip math headers to implement math.h
for openmp, which needs to work with C as well as C++. NFC for C++ code.

Reviewers: yaxunl, jdoerfert

Reviewed By: yaxunl

Subscribers: sstefan1, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D84476

Added: 


Modified: 
clang/lib/Headers/__clang_hip_libdevice_declares.h
clang/lib/Headers/__clang_hip_math.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_hip_libdevice_declares.h 
b/clang/lib/Headers/__clang_hip_libdevice_declares.h
index 711040443440..2cf9cc7f1eb6 100644
--- a/clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ b/clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplus
 extern "C" {
+#endif
 
 // BEGIN FLOAT
 __device__ __attribute__((const)) float __ocml_acos_f32(float);
@@ -316,7 +318,7 @@ __device__ __attribute__((pure)) __2f16 
__ocml_log2_2f16(__2f16);
 __device__ inline __2f16
 __llvm_amdgcn_rcp_2f16(__2f16 __x) // Not currently exposed by ROCDL.
 {
-  return __2f16{__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
+  return (__2f16){__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
 }
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
@@ -325,6 +327,8 @@ __device__ __attribute__((const)) __2f16 
__ocml_sqrt_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_pown_2f16(__2f16, __2i16);
 
+#ifdef __cplusplus
 } // extern "C"
+#endif
 
 #endif // __CLANG_HIP_LIBDEVICE_DECLARES_H__

diff  --git a/clang/lib/Headers/__clang_hip_math.h 
b/clang/lib/Headers/__clang_hip_math.h
index 47d3c1717559..f9ca9bf606fb 100644
--- a/clang/lib/Headers/__clang_hip_math.h
+++ b/clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@ inline uint64_t __make_mantissa(const char *__tagp) {
 }
 
 // BEGIN FLOAT
+#ifdef __cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@ inline float nanf(const char *__tagp) {
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@ inline float __tanf(float __x) { return 
__ocml_tan_f32(__x); }
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef __cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@ inline double nan(const char *__tagp) {
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@ __host__ inline static int max(int __arg1, int __arg2) {
   return std::max(__arg1, __arg2);
 }
 
+#ifdef __cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@ __DEVICE__
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield edited 
https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Looks solid to me. The patch to clang is long but straightforward and the tests 
look reassuringly exhaustive. Probably good that you ignored my name suggestion 
of integers 0 through N.

This patch is partly motivated by us wanting device scope atomics in libc. It 
removes one of the remaining stumbling blocks for people who like freestanding 
C++ as a GPU programming language.

Hopefully the clang people consider the extension acceptable.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-12-07 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.

This is functionally correct and useful as is - if gcc decide to do something 
divergent we can change it later, it's basically an internal interface anyway. 
Let's have it now and change the names if we come up with better ideas later.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)

2023-11-17 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.

Test change is suspect for a patch claiming NFC but it looks like the change is 
harmless. Thanks for separating refactor from functional change

https://github.com/llvm/llvm-project/pull/72544
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [lld] [mlir] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)

2023-11-21 Thread Jon Chesterfield via cfe-commits


@@ -75,8 +75,8 @@ bb.2:
   store volatile i32 0, ptr addrspace(1) undef
   ret void
 }
-; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4112
-; DEFAULTSIZE: ; ScratchSize: 4112
+; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 16

JonChesterfield wrote:

This seems a bit suspect. It used to be about 4k and is now 16. Are we out by a 
factor of 1024 somewhere?

https://github.com/llvm/llvm-project/pull/73000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [lld] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)

2023-11-21 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

This is a wild amount of code churn from a trivial change. 10k lines of almost 
all noise. Means the chances of us noticing breakage in a code review tool is 
pretty low.

How about as a first patch we pass `-code-object=v4` or whatever syntax to 
essentially all the tests, then rebase this, so that we can get something 
approximating "this is the functional change, with the codegen change visible 
in these tests"?

In general it seems likely that a lot of tests are checking things they don't 
actually care about, probably because they're frequently generated by the 
python thing. Maybe some of the noise can be removed by tweaking the test 
generator script to emit checks that are insensitive to ABI version?

https://github.com/llvm/llvm-project/pull/73000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-12-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

The capability is more important than the naming. `__llvm_atomic_scoped_load` 
would be fine, with string literals or enum or macro controlling the scope. I 
also don't mind if it's a scoped argument or if we end up with 
`__llvm_atomic_seqcst_device_load`, embedding all of it in the symbol name. 
Clang can't currently instantiate IR atomics with scopes and it would be useful 
to do so.

If GCC picks a different set of names - maybe they go with defines for scope 
and we go with strings, and the names differ - we get to pick between renaming 
ours, adding aliases, ignoring the divergence.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 3e649f8 - [openmp][nfc] Simplify macros guarding math complex headers

2021-07-18 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-07-18T23:30:35+01:00
New Revision: 3e649f8ef1875f943537b5fcecdb132c9442cb7d

URL: 
https://github.com/llvm/llvm-project/commit/3e649f8ef1875f943537b5fcecdb132c9442cb7d
DIFF: 
https://github.com/llvm/llvm-project/commit/3e649f8ef1875f943537b5fcecdb132c9442cb7d.diff

LOG: [openmp][nfc] Simplify macros guarding math complex headers

The `__CUDA__` macro is already defined for openmp/nvptx and is not used by
`__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).

Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.

Reviewed By: jdoerfert, fodinabor

Differential Revision: https://reviews.llvm.org/D105221

Added: 


Modified: 
clang/lib/Headers/openmp_wrappers/complex
clang/lib/Headers/openmp_wrappers/complex.h

Removed: 




diff  --git a/clang/lib/Headers/openmp_wrappers/complex 
b/clang/lib/Headers/openmp_wrappers/complex
index 142e526b81b35..dfd6193c97cbd 100644
--- a/clang/lib/Headers/openmp_wrappers/complex
+++ b/clang/lib/Headers/openmp_wrappers/complex
@@ -17,7 +17,6 @@
 // We require std::math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__
@@ -26,9 +25,6 @@
 // Grab the host header too.
 #include_next 
 
-
-#ifdef __cplusplus
-
 // If we are compiling against libc++, the macro _LIBCPP_STD_VER should be set
 // after including  above. Since the complex header we use is a
 // simplified version of the libc++, we don't need it in this case. If we
@@ -48,5 +44,3 @@
 #pragma omp end declare variant
 
 #endif
-
-#endif

diff  --git a/clang/lib/Headers/openmp_wrappers/complex.h 
b/clang/lib/Headers/openmp_wrappers/complex.h
index 00d278548f826..15dc415b8126d 100644
--- a/clang/lib/Headers/openmp_wrappers/complex.h
+++ b/clang/lib/Headers/openmp_wrappers/complex.h
@@ -17,7 +17,6 @@
 // We require math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 968899a - [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via cfe-commits

Author: Pushpinder Singh
Date: 2021-07-21T16:15:39+01:00
New Revision: 968899ad9cf17579f9867dafb35c4d97bad0863f

URL: 
https://github.com/llvm/llvm-project/commit/968899ad9cf17579f9867dafb35c4d97bad0863f
DIFF: 
https://github.com/llvm/llvm-project/commit/968899ad9cf17579f9867dafb35c4d97bad0863f.diff

LOG: [OpenMP][AMDGCN] Initial math headers support

With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Reviewed By: JonChesterfield, jdoerfert, scchan

Differential Revision: https://reviews.llvm.org/D104904

Added: 
clang/test/Headers/Inputs/include/algorithm
clang/test/Headers/Inputs/include/utility
clang/test/Headers/amdgcn_openmp_device_math.c

Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Headers/__clang_hip_cmath.h
clang/lib/Headers/__clang_hip_math.h
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
clang/lib/Headers/openmp_wrappers/cmath
clang/lib/Headers/openmp_wrappers/math.h
clang/test/Headers/Inputs/include/cstdlib
clang/test/Headers/openmp_device_math_isnan.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 837ead86d6202..cf8e209ba1af1 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1255,7 +1255,8 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
   // If we are offloading to a target via OpenMP we need to include the
   // openmp_wrappers folder which contains alternative system headers.
   if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
-  getToolChain().getTriple().isNVPTX()){
+  (getToolChain().getTriple().isNVPTX() ||
+   getToolChain().getTriple().isAMDGCN())) {
 if (!Args.hasArg(options::OPT_nobuiltininc)) {
   // Add openmp_wrappers/* to our system include path.  This lets us wrap
   // standard library headers.

diff  --git a/clang/lib/Headers/__clang_hip_cmath.h 
b/clang/lib/Headers/__clang_hip_cmath.h
index 7342705434e6b..6f7cbde38dd20 100644
--- a/clang/lib/Headers/__clang_hip_cmath.h
+++ b/clang/lib/Headers/__clang_hip_cmath.h
@@ -10,7 +10,7 @@
 #ifndef __CLANG_HIP_CMATH_H__
 #define __CLANG_HIP_CMATH_H__
 
-#if !defined(__HIP__)
+#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__)
 #error "This file is for HIP and OpenMP AMDGCN device compilation only."
 #endif
 
@@ -25,31 +25,38 @@
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
+#pragma push_macro("__CONSTEXPR__")
+#ifdef __OPENMP_AMDGCN__
+#define __DEVICE__ static __attribute__((always_inline, nothrow))
+#define __CONSTEXPR__ constexpr
+#else
 #define __DEVICE__ static __device__ inline __attribute__((always_inline))
+#define __CONSTEXPR__
+#endif // __OPENMP_AMDGCN__
 
 // Start with functions that cannot be defined by DEF macros below.
 #if defined(__cplusplus)
-__DEVICE__ double abs(double __x) { return ::fabs(__x); }
-__DEVICE__ float abs(float __x) { return ::fabsf(__x); }
-__DEVICE__ long long abs(long long __n) { return ::llabs(__n); }
-__DEVICE__ long abs(long __n) { return ::labs(__n); }
-__DEVICE__ float fma(float __x, float __y, float __z) {
+__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); }
+__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); }
+__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); }
+__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); }
+__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) {
   return ::fmaf(__x, __y, __z);
 }
 #if !defined(__HIPCC_RTC__)
 // The value returned by fpclassify is platform dependent, therefore it is not
 // supported by hipRTC.
-__DEVICE__ int fpclassify(float __x) {
+__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
-__DEVICE__ int fpclassify(double __x) {
+__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
 #endif // !defined(__HIPCC_RTC__)
 
-__DEVICE__ float frexp(float __arg, int *__exp) {
+__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) {
   return ::frexpf(__arg, __exp);
 }
 
@@ -71,90 +78,97 @@ __DEVICE__ float frexp(float __arg, int *__exp) {
 //of the variants inside the inner region and avoid the clash.
 #pragma omp begin declare variant match(implementation = {vendor(llvm)})
 
-__DEVICE__ int isinf(float __x) { return ::__isinff(__x); }
-__DEVICE__ int isinf(double __x) { return ::__isinf(__x); }
-__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); }
-__DEVICE__ int isfinite(double __x) { return ::__finite(__x); }
-__DEVICE__ int is

[clang] d71062f - Revert "[OpenMP][AMDGCN] Initial math headers support"

2021-07-21 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-07-21T17:35:40+01:00
New Revision: d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9

URL: 
https://github.com/llvm/llvm-project/commit/d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9
DIFF: 
https://github.com/llvm/llvm-project/commit/d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9.diff

LOG: Revert "[OpenMP][AMDGCN] Initial math headers support"

This reverts commit 968899ad9cf17579f9867dafb35c4d97bad0863f.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Headers/__clang_hip_cmath.h
clang/lib/Headers/__clang_hip_math.h
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
clang/lib/Headers/openmp_wrappers/cmath
clang/lib/Headers/openmp_wrappers/math.h
clang/test/Headers/Inputs/include/cstdlib
clang/test/Headers/openmp_device_math_isnan.cpp

Removed: 
clang/test/Headers/Inputs/include/algorithm
clang/test/Headers/Inputs/include/utility
clang/test/Headers/amdgcn_openmp_device_math.c



diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index cf8e209ba1af1..837ead86d6202 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1255,8 +1255,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
   // If we are offloading to a target via OpenMP we need to include the
   // openmp_wrappers folder which contains alternative system headers.
   if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
+  getToolChain().getTriple().isNVPTX()){
 if (!Args.hasArg(options::OPT_nobuiltininc)) {
   // Add openmp_wrappers/* to our system include path.  This lets us wrap
   // standard library headers.

diff  --git a/clang/lib/Headers/__clang_hip_cmath.h 
b/clang/lib/Headers/__clang_hip_cmath.h
index 6f7cbde38dd20..7342705434e6b 100644
--- a/clang/lib/Headers/__clang_hip_cmath.h
+++ b/clang/lib/Headers/__clang_hip_cmath.h
@@ -10,7 +10,7 @@
 #ifndef __CLANG_HIP_CMATH_H__
 #define __CLANG_HIP_CMATH_H__
 
-#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__)
+#if !defined(__HIP__)
 #error "This file is for HIP and OpenMP AMDGCN device compilation only."
 #endif
 
@@ -25,38 +25,31 @@
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
-#pragma push_macro("__CONSTEXPR__")
-#ifdef __OPENMP_AMDGCN__
-#define __DEVICE__ static __attribute__((always_inline, nothrow))
-#define __CONSTEXPR__ constexpr
-#else
 #define __DEVICE__ static __device__ inline __attribute__((always_inline))
-#define __CONSTEXPR__
-#endif // __OPENMP_AMDGCN__
 
 // Start with functions that cannot be defined by DEF macros below.
 #if defined(__cplusplus)
-__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); }
-__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); }
-__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); }
-__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); }
-__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) {
+__DEVICE__ double abs(double __x) { return ::fabs(__x); }
+__DEVICE__ float abs(float __x) { return ::fabsf(__x); }
+__DEVICE__ long long abs(long long __n) { return ::llabs(__n); }
+__DEVICE__ long abs(long __n) { return ::labs(__n); }
+__DEVICE__ float fma(float __x, float __y, float __z) {
   return ::fmaf(__x, __y, __z);
 }
 #if !defined(__HIPCC_RTC__)
 // The value returned by fpclassify is platform dependent, therefore it is not
 // supported by hipRTC.
-__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) {
+__DEVICE__ int fpclassify(float __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
-__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) {
+__DEVICE__ int fpclassify(double __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
 #endif // !defined(__HIPCC_RTC__)
 
-__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) {
+__DEVICE__ float frexp(float __arg, int *__exp) {
   return ::frexpf(__arg, __exp);
 }
 
@@ -78,97 +71,90 @@ __DEVICE__ __CONSTEXPR__ float frexp(float __arg, int 
*__exp) {
 //of the variants inside the inner region and avoid the clash.
 #pragma omp begin declare variant match(implementation = {vendor(llvm)})
 
-__DEVICE__ __CONSTEXPR__ int isinf(float __x) { return ::__isinff(__x); }
-__DEVICE__ __CONSTEXPR__ int isinf(double __x) { return ::__isinf(__x); }
-__DEVICE__ __CONSTEXPR__ int isfinite(float __x) { return ::__finitef(__x); }
-__DEVICE__ __CONSTEXPR__ int isfinite(double __x) { return ::__finite(__x); }
-__DEVICE__ __CONSTEXPR__ int isnan(float __x) { return ::__isnanf(__x); }
-__DEVICE__ __CONSTEXPR__ int isnan(double __x) { return ::__isn

[clang] 83c431f - [amdgpu] Add amdgpu_kernel calling conv attribute to clang

2022-05-20 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2022-05-20T08:50:37+01:00
New Revision: 83c431fb9e72abbd2eddf26388245eb4963370e2

URL: 
https://github.com/llvm/llvm-project/commit/83c431fb9e72abbd2eddf26388245eb4963370e2
DIFF: 
https://github.com/llvm/llvm-project/commit/83c431fb9e72abbd2eddf26388245eb4963370e2.diff

LOG: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

Allows emitting define amdgpu_kernel void @func() IR from C or C++.

This replaces the current workflow which is to write a stub in opencl that
calls an external C function implemented in C++ combined through llvm-link.

Calling the resulting function still requires a manual implementation of the
ABI from the host side. The primary application is for more rapid debugging
of the amdgpu backend by permuting a C or C++ test file instead of manually
updating an IR file.

Implementation closely follows D54425. Non-amd reviewers from there.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D125970

Added: 
clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp

Modified: 
clang/include/clang/Basic/Attr.td
clang/include/clang/Basic/Specifiers.h
clang/lib/AST/ItaniumMangle.cpp
clang/lib/AST/Type.cpp
clang/lib/AST/TypePrinter.cpp
clang/lib/Basic/Targets/AMDGPU.h
clang/lib/CodeGen/CGCall.cpp
clang/lib/CodeGen/CGDebugInfo.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/lib/Sema/SemaType.cpp
clang/test/Sema/callingconv.c
clang/tools/libclang/CXType.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index 0c95adfa237d7..fed29b03a8b14 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1857,6 +1857,11 @@ def AMDGPUNumVGPR : InheritableAttr {
   let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">;
 }
 
+def AMDGPUKernelCall : DeclOrTypeAttr {
+  let Spellings = [Clang<"amdgpu_kernel">];
+  let Documentation = [Undocumented];
+}
+
 def BPFPreserveAccessIndex : InheritableAttr,
  TargetSpecificAttr  {
   let Spellings = [Clang<"preserve_access_index">];

diff  --git a/clang/include/clang/Basic/Specifiers.h 
b/clang/include/clang/Basic/Specifiers.h
index 7a727e7088deb..7657ae36d21bb 100644
--- a/clang/include/clang/Basic/Specifiers.h
+++ b/clang/include/clang/Basic/Specifiers.h
@@ -281,6 +281,7 @@ namespace clang {
 CC_PreserveAll,  // __attribute__((preserve_all))
 CC_AArch64VectorCall, // __attribute__((aarch64_vector_pcs))
 CC_AArch64SVEPCS, // __attribute__((aarch64_sve_pcs))
+CC_AMDGPUKernelCall, // __attribute__((amdgpu_kernel))
   };
 
   /// Checks whether the given calling convention supports variadic

diff  --git a/clang/lib/AST/ItaniumMangle.cpp b/clang/lib/AST/ItaniumMangle.cpp
index 1be70487c1b4e..b380e02fc8f7d 100644
--- a/clang/lib/AST/ItaniumMangle.cpp
+++ b/clang/lib/AST/ItaniumMangle.cpp
@@ -3150,6 +3150,7 @@ StringRef 
CXXNameMangler::getCallingConvQualifierName(CallingConv CC) {
   case CC_AAPCS_VFP:
   case CC_AArch64VectorCall:
   case CC_AArch64SVEPCS:
+  case CC_AMDGPUKernelCall:
   case CC_IntelOclBicc:
   case CC_SpirFunction:
   case CC_OpenCLKernel:

diff  --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 200a129437ed5..ece4165c51f53 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -3186,6 +3186,7 @@ StringRef FunctionType::getNameForCallConv(CallingConv 
CC) {
   case CC_AAPCS_VFP: return "aapcs-vfp";
   case CC_AArch64VectorCall: return "aarch64_vector_pcs";
   case CC_AArch64SVEPCS: return "aarch64_sve_pcs";
+  case CC_AMDGPUKernelCall: return "amdgpu_kernel";
   case CC_IntelOclBicc: return "intel_ocl_bicc";
   case CC_SpirFunction: return "spir_function";
   case CC_OpenCLKernel: return "opencl_kernel";
@@ -3622,6 +3623,7 @@ bool AttributedType::isCallingConv() const {
   case attr::VectorCall:
   case attr::AArch64VectorPcs:
   case attr::AArch64SVEPcs:
+  case attr::AMDGPUKernelCall:
   case attr::Pascal:
   case attr::MSABI:
   case attr::SysVABI:

diff  --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 7bca45b5f5601..b5286de5b1cab 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -964,6 +964,9 @@ void TypePrinter::printFunctionAfter(const 
FunctionType::ExtInfo &Info,
 case CC_AArch64SVEPCS:
   OS << "__attribute__((aarch64_sve_pcs))";
   break;
+case CC_AMDGPUKernelCall:
+  OS << "__attribute__((amdgpu_kernel))";
+  break;
 case CC_IntelOclBicc:
   OS << " __attribute__((intel_ocl_bicc))";
   break;
@@ -1754,6 +1757,7 @@ void TypePrinter::printAttributedAfter(const 
AttributedType *T,
   }
   case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break;
   case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break;
+  case attr::AMDGPUKernelCall: OS << "amdgpu_kernel"; break;
   case attr::IntelOclBicc:

[libunwind] [libunwind] Compile the asm as well as the C++ source (PR #86351)

2024-03-22 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield created 
https://github.com/llvm/llvm-project/pull/86351

When a CMakeLists.txt is missing a 'project' statement you get the default 
supported languages of C and CXX. 
https://cmake.org/cmake/help/latest/command/project.html. The help says ASM 
should be listed last.

CMake doesn't raise an error about the .S files it has been told about when 
project is missing. It silently ignores them. In this case, the symptom is an 
undefined symbol *jumpto in the library.

Working theory for why this isn't more obviously broken everywhere is the 
'runtimes' CMakeLists.txt does contain a 'project' statement which lists ASM 
and/or by default linking shared libraries with undefined symbols succeeds.

The string immediately after project appears to be arbitrary, chosen 'Unwind' 
to match the capitalization of 'Runtimes'. 

For completeness, this also removes the following warning when building 
libunwind by itself:

>CMake Warning (dev) in CMakeLists.txt:
>  No project() command is present.  The top-level CMakeLists.txt file must
>  contain a literal, direct call to the project() command.  Add a line of
>  code such as
>
>project(ProjectName)
>
>  near the top of the file, but after cmake_minimum_required().
>
>  CMake is pretending there is a "project(Project)" command on the first
>  line.
> This warning is for project developers.  Use -Wno-dev to suppress it.

This gives no hint that the consequence of ignoring this warning is cmake will 
ignore your assembly.

>From 3cacdbc0585d00c8820dc7baa0cba378beeff339 Mon Sep 17 00:00:00 2001
From: Jon Chesterfield 
Date: Fri, 22 Mar 2024 22:02:06 +
Subject: [PATCH] [libunwind] Compile the asm as well as the C source

---
 libunwind/CMakeLists.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libunwind/CMakeLists.txt b/libunwind/CMakeLists.txt
index 806d5a783ec39c..01d3b72b73e842 100644
--- a/libunwind/CMakeLists.txt
+++ b/libunwind/CMakeLists.txt
@@ -3,6 +3,7 @@
 
#===
 
 cmake_minimum_required(VERSION 3.20.0)
+project(Unwind LANGUAGES C CXX ASM)
 
 set(LLVM_COMMON_CMAKE_UTILS "${CMAKE_CURRENT_SOURCE_DIR}/../cmake")
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libunwind] [libunwind] Compile the asm as well as the C++ source (PR #86351)

2024-03-22 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

I'm sorry to hear that. I've only used the ENABLE_RUNTIMES in the context of 
compiling clang first, and then compiling the libraries under runtime with that 
clang. The recursive invocation drops (most) arguments passed to cmake which 
has been obstructive in the past.

With standalone build (presumably ENABLE_PROJECTS) removed, how does one build 
the libraries using an existing compiler?

https://github.com/llvm/llvm-project/pull/86351
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield edited 
https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,716 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface() {}
+
+public:
+  virtual ~Interface() {}
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst *va_list, Value * /*buffer*/) {

JonChesterfield wrote:

Expressing the class invariants through the vtable seems harder than it should 
be. This function is only called if valistOnStack returns true, changing the 
base to a builtin_unreachable.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,716 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface() {}
+
+public:
+  virtual ~Interface() {}
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst *va_list, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+assert(va_list == nullptr);
+  }
+
+  virtual uint32_t minimum_slot_align() = 0;
+  virtual uint32_t maximum_slot_align() = 0;
+
+  // Could make these virtual, fair chance that's free since all
+  // classes choose not to override them at present
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,716 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface() {}
+
+public:
+  virtual ~Interface() {}
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst *va_list, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+assert(va_list == nullptr);
+  }
+
+  virtual uint32_t minimum_slot_align() = 0;
+  virtual uint32_t maximum_slot_align() = 0;
+
+  // Could make these virtual, fair chance that's free since all
+  // classes choose not to override them at present
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,716 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface() {}
+
+public:
+  virtual ~Interface() {}
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst *va_list, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+assert(va_list == nullptr);
+  }
+
+  virtual uint32_t minimum_slot_align() = 0;
+  virtual uint32_t maximum_slot_align() = 0;
+
+  // Could make these virtual, fair chance that's free since all
+  // classes choose not to override them at present
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield edited 
https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();

JonChesterfield wrote:

I've gone with more comments, but maybe to make this clear enough I need to 
separate the field alignment from whether va_list is a void* or something that 
needs more 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);

JonChesterfield wrot

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,117 @@
+// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 
-disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt 
-passes="expand-variadics,default" -S | FileCheck %s 
--check-prefixes=CHECK,X86Linux

JonChesterfield wrote:

Nope. There's something weird/broken with opt here. Failure reproduces with 
other passes. Issue https://github.com/llvm/llvm-project/issues/81128

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Patch run through `clang-tidy --checks=readability-identifier-naming` with the 
config file in tree and recommendations applied. Some of the choices seem poor 
but it's presumably acceptable.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,589 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: -p --function-signature
+; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s
+target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; The types show the call frames
+; CHECK: %single_i32.vararg = type <{ i32 }>
+; CHECK: %single_double.vararg = type <{ double }>
+; CHECK: %single_v4f32.vararg = type <{ <4 x float> }>
+; CHECK: %single_v8f32.vararg = type <{ <8 x float> }>
+; CHECK: %single_v16f32.vararg = type <{ <16 x float> }>
+; CHECK: %single_v32f32.vararg = type <{ <32 x float> }>
+; CHECK: %i32_double.vararg = type <{ i32, [4 x i8], double }>
+; CHECK: %double_i32.vararg = type <{ double, i32 }>
+; CHECK: %i32_v4f32.vararg = type <{ i32, [12 x i8], <4 x float> }>
+; CHECK: %v4f32_i32.vararg = type <{ <4 x float>, i32 }>
+; CHECK: %i32_v8f32.vararg = type <{ i32, [28 x i8], <8 x float> }>
+; CHECK: %v8f32_i32.vararg = type <{ <8 x float>, i32 }>
+; CHECK: %i32_v16f32.vararg = type <{ i32, [60 x i8], <16 x float> }>
+; CHECK: %v16f32_i32.vararg = type <{ <16 x float>, i32 }>
+; CHECK: %i32_v32f32.vararg = type <{ i32, [124 x i8], <32 x float> }>
+; CHECK: %v32f32_i32.vararg = type <{ <32 x float>, i32 }>
+
+%struct.__va_list_tag = type { i32, i32, ptr, ptr }
+%struct.libcS = type { i8, i16, i32, i64, float, double }
+
+define dso_local void @codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@codegen_for_copy(ptr noundef %x) 
local_unnamed_addr #0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%cp = alloca [1 x %struct.__va_list_tag], align 16
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #6
+; CHECK-NEXT:call void @llvm.va_copy(ptr nonnull %cp, ptr %x)
+; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %cp) #7
+; CHECK-NEXT:call void @llvm.va_end(ptr %cp)
+; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #6
+; CHECK-NEXT:ret void
+;
+entry:
+  %cp = alloca [1 x %struct.__va_list_tag], align 16
+  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #5
+  call void @llvm.va_copy(ptr nonnull %cp, ptr %x)
+  call void @wrapped(ptr noundef nonnull %cp) #6
+  call void @llvm.va_end(ptr %cp)
+  call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #5
+  ret void
+}
+
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1
+
+declare void @llvm.va_copy(ptr, ptr) #2
+
+declare void @wrapped(ptr noundef) local_unnamed_addr #3
+
+declare void @llvm.va_end(ptr) #2
+
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1
+
+define dso_local void @vararg(...) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@vararg(...) local_unnamed_addr #0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%va = alloca [1 x %struct.__va_list_tag], align 16
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #6
+; CHECK-NEXT:call void @llvm.va_start(ptr nonnull %va)
+; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %va) #7
+; CHECK-NEXT:call void @llvm.va_end(ptr %va)
+; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #6
+; CHECK-NEXT:ret void
+;
+entry:
+  %va = alloca [1 x %struct.__va_list_tag], align 16
+  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #5
+  call void @llvm.va_start(ptr nonnull %va)
+  call void @wrapped(ptr noundef nonnull %va) #6
+  call void @llvm.va_end(ptr %va)
+  call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #5
+  ret void
+}
+
+declare void @llvm.va_start(ptr) #2
+
+define dso_local void @single_i32(i32 noundef %x) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@single_i32(i32 noundef %x) local_unnamed_addr 
#0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%vararg_buffer = alloca %single_i32.vararg, align 8
+; CHECK-NEXT:%0 = getelementptr inbounds %single_i32.vararg, ptr 
%vararg_buffer, i32 0, i32 0
+; CHECK-NEXT:store i32 %x, ptr %0, align 4
+; CHECK-NEXT:%va_list = alloca [1 x { i32, i32, ptr, ptr }], align 8
+; CHECK-NEXT:%gp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr 
}], ptr %va_list, i64 0, i32 0, i32 0
+; CHECK-NEXT:store i32 48, ptr %gp_offset, align 4
+; CHECK-NEXT:%fp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr 
}], ptr %va_list, i64 0, i32 0, i32 1
+; CHECK-NEXT:store i32 176, ptr %fp_offset, align 4
+; CHECK-NEXT:%overfow_arg_area = getelementptr inbounds [1 x { i32, i32, 
ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 2
+; CHECK-NEXT:store ptr %vararg_buffer, ptr %overfow_arg_area, align 8
+; CHECK-NEXT:%reg_save_area = getelementptr inbounds [1 x { i32, i32, ptr, 
ptr }], ptr %va_list, i64 0, i32 0, i32 3
+; CHECK-NEXT:store ptr null, ptr %reg_save_area, align 8
+; CHECK-NEXT:call void @wrapped(ptr %va_list) #8
+; CHECK-NEXT:ret void
+;
+entry:
+  tail cal

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,589 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: -p --function-signature
+; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s
+target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; The types show the call frames
+; CHECK: %single_i32.vararg = type <{ i32 }>
+; CHECK: %single_double.vararg = type <{ double }>
+; CHECK: %single_v4f32.vararg = type <{ <4 x float> }>
+; CHECK: %single_v8f32.vararg = type <{ <8 x float> }>
+; CHECK: %single_v16f32.vararg = type <{ <16 x float> }>
+; CHECK: %single_v32f32.vararg = type <{ <32 x float> }>
+; CHECK: %i32_double.vararg = type <{ i32, [4 x i8], double }>
+; CHECK: %double_i32.vararg = type <{ double, i32 }>
+; CHECK: %i32_v4f32.vararg = type <{ i32, [12 x i8], <4 x float> }>
+; CHECK: %v4f32_i32.vararg = type <{ <4 x float>, i32 }>
+; CHECK: %i32_v8f32.vararg = type <{ i32, [28 x i8], <8 x float> }>
+; CHECK: %v8f32_i32.vararg = type <{ <8 x float>, i32 }>
+; CHECK: %i32_v16f32.vararg = type <{ i32, [60 x i8], <16 x float> }>
+; CHECK: %v16f32_i32.vararg = type <{ <16 x float>, i32 }>
+; CHECK: %i32_v32f32.vararg = type <{ i32, [124 x i8], <32 x float> }>
+; CHECK: %v32f32_i32.vararg = type <{ <32 x float>, i32 }>
+
+%struct.__va_list_tag = type { i32, i32, ptr, ptr }
+%struct.libcS = type { i8, i16, i32, i64, float, double }
+
+define dso_local void @codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@codegen_for_copy(ptr noundef %x) 
local_unnamed_addr #0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%cp = alloca [1 x %struct.__va_list_tag], align 16
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #6
+; CHECK-NEXT:call void @llvm.va_copy(ptr nonnull %cp, ptr %x)
+; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %cp) #7
+; CHECK-NEXT:call void @llvm.va_end(ptr %cp)
+; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #6
+; CHECK-NEXT:ret void
+;
+entry:
+  %cp = alloca [1 x %struct.__va_list_tag], align 16
+  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #5
+  call void @llvm.va_copy(ptr nonnull %cp, ptr %x)
+  call void @wrapped(ptr noundef nonnull %cp) #6
+  call void @llvm.va_end(ptr %cp)
+  call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #5
+  ret void
+}
+
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1
+
+declare void @llvm.va_copy(ptr, ptr) #2
+
+declare void @wrapped(ptr noundef) local_unnamed_addr #3
+
+declare void @llvm.va_end(ptr) #2
+
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1
+
+define dso_local void @vararg(...) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@vararg(...) local_unnamed_addr #0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%va = alloca [1 x %struct.__va_list_tag], align 16
+; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #6
+; CHECK-NEXT:call void @llvm.va_start(ptr nonnull %va)
+; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %va) #7
+; CHECK-NEXT:call void @llvm.va_end(ptr %va)
+; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #6
+; CHECK-NEXT:ret void
+;
+entry:
+  %va = alloca [1 x %struct.__va_list_tag], align 16
+  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #5
+  call void @llvm.va_start(ptr nonnull %va)
+  call void @wrapped(ptr noundef nonnull %va) #6
+  call void @llvm.va_end(ptr %va)
+  call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #5
+  ret void
+}
+
+declare void @llvm.va_start(ptr) #2
+
+define dso_local void @single_i32(i32 noundef %x) local_unnamed_addr #0 {
+; CHECK-LABEL: define {{[^@]+}}@single_i32(i32 noundef %x) local_unnamed_addr 
#0 {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:%vararg_buffer = alloca %single_i32.vararg, align 8
+; CHECK-NEXT:%0 = getelementptr inbounds %single_i32.vararg, ptr 
%vararg_buffer, i32 0, i32 0
+; CHECK-NEXT:store i32 %x, ptr %0, align 4
+; CHECK-NEXT:%va_list = alloca [1 x { i32, i32, ptr, ptr }], align 8
+; CHECK-NEXT:%gp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr 
}], ptr %va_list, i64 0, i32 0, i32 0
+; CHECK-NEXT:store i32 48, ptr %gp_offset, align 4
+; CHECK-NEXT:%fp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr 
}], ptr %va_list, i64 0, i32 0, i32 1
+; CHECK-NEXT:store i32 176, ptr %fp_offset, align 4
+; CHECK-NEXT:%overfow_arg_area = getelementptr inbounds [1 x { i32, i32, 
ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 2
+; CHECK-NEXT:store ptr %vararg_buffer, ptr %overfow_arg_area, align 8
+; CHECK-NEXT:%reg_save_area = getelementptr inbounds [1 x { i32, i32, ptr, 
ptr }], ptr %va_list, i64 0, i32 0, i32 3
+; CHECK-NEXT:store ptr null, ptr %reg_save_area, align 8
+; CHECK-NEXT:call void @wrapped(ptr %va_list) #8
+; CHECK-NEXT:ret void
+;
+entry:
+  tail cal

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);

JonChesterfield wrot

[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)

2024-02-12 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

New intrinsic sounds right - a constant frequency counter is a different thing 
to a variable frequency counter.

"Steady" implies unchanging, so I'd agree with `readfixedfreqtimer` or similar.

We can't have a ratio between the two counters since one changes frequency and 
one doesn't.

Does x64 have something that maps usefully onto a fixed frequency counter 
intrinsic?

https://github.com/llvm/llvm-project/pull/81331
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

The "generic IR" thing is more emergent behaviour than a documented / 
intentional design.

This patch is fine - we shouldn't set macros to nonsense values - but if this 
is a step towards building libc like the rocm-device-libs there may be push 
back on that one.

https://github.com/llvm/llvm-project/pull/79660
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

2024-01-29 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.


https://github.com/llvm/llvm-project/pull/79660
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][AMDGPU] Do not include 'ockl' implementations in OpenMP (PR #70462)

2023-10-27 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.


https://github.com/llvm/llvm-project/pull/70462
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] bcaa806 - [Clang] Fix BZ47169, loader_uninitialized on incomplete types

2020-08-19 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-08-19T18:11:50+01:00
New Revision: bcaa806a4747595116b538e8b75b12966e6607f6

URL: 
https://github.com/llvm/llvm-project/commit/bcaa806a4747595116b538e8b75b12966e6607f6
DIFF: 
https://github.com/llvm/llvm-project/commit/bcaa806a4747595116b538e8b75b12966e6607f6.diff

LOG: [Clang] Fix BZ47169, loader_uninitialized on incomplete types

[Clang] Fix BZ47169, loader_uninitialized on incomplete types

Reported by @erichkeane. Fix proposed by @erichkeane works, tests included.
Bug introduced in D74361. Crash was on querying a CXXRecordDecl for
hasTrivialDefaultConstructor on an incomplete type. Fixed by calling
RequireCompleteType in the right place.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D85990

Added: 


Modified: 
clang/lib/Sema/SemaDecl.cpp
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
clang/test/Sema/attr-loader-uninitialized.c
clang/test/Sema/attr-loader-uninitialized.cpp

Removed: 




diff  --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index ab1496337210..566a2f9da681 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -12476,6 +12476,17 @@ void Sema::ActOnUninitializedDecl(Decl *RealDecl) {
 }
 
 if (!Var->isInvalidDecl() && RealDecl->hasAttr()) 
{
+  if (Var->getStorageClass() == SC_Extern) {
+Diag(Var->getLocation(), diag::err_loader_uninitialized_extern_decl)
+<< Var;
+Var->setInvalidDecl();
+return;
+  }
+  if (RequireCompleteType(Var->getLocation(), Var->getType(),
+  diag::err_typecheck_decl_incomplete_type)) {
+Var->setInvalidDecl();
+return;
+  }
   if (CXXRecordDecl *RD = Var->getType()->getAsCXXRecordDecl()) {
 if (!RD->hasTrivialDefaultConstructor()) {
   Diag(Var->getLocation(), 
diag::err_loader_uninitialized_trivial_ctor);
@@ -12483,12 +12494,6 @@ void Sema::ActOnUninitializedDecl(Decl *RealDecl) {
   return;
 }
   }
-  if (Var->getStorageClass() == SC_Extern) {
-Diag(Var->getLocation(), diag::err_loader_uninitialized_extern_decl)
-<< Var;
-Var->setInvalidDecl();
-return;
-  }
 }
 
 VarDecl::DefinitionKind DefKind = Var->isThisDeclarationADefinition();

diff  --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp 
b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
index e82ae47e9f16..6501a25bf5bc 100644
--- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
+++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
@@ -28,3 +28,15 @@ double arr[32] __attribute__((loader_uninitialized));
 // Defining as arr2[] [[clang..]] raises the error: attribute cannot be 
applied to types
 // CHECK: @arr2 = global [4 x double] undef
 double arr2 [[clang::loader_uninitialized]] [4];
+
+template struct templ{T * t;};
+
+// CHECK: @templ_int = global %struct.templ undef, align 8
+templ templ_int [[clang::loader_uninitialized]];
+
+// CHECK: @templ_trivial = global %struct.templ.0 undef, align 8
+templ templ_trivial [[clang::loader_uninitialized]];
+
+// CHECK: @templ_incomplete = global %struct.templ.1 undef, align 8
+struct incomplete;
+templ templ_incomplete [[clang::loader_uninitialized]];

diff  --git a/clang/test/Sema/attr-loader-uninitialized.c 
b/clang/test/Sema/attr-loader-uninitialized.c
index f2e78d981580..a1edd858e27f 100644
--- a/clang/test/Sema/attr-loader-uninitialized.c
+++ b/clang/test/Sema/attr-loader-uninitialized.c
@@ -10,6 +10,10 @@ const int can_still_be_const 
__attribute__((loader_uninitialized));
 extern int external_rejected __attribute__((loader_uninitialized));
 // expected-error@-1 {{variable 'external_rejected' cannot be declared both 
'extern' and with the 'loader_uninitialized' attribute}}
 
+struct S;
+extern struct S incomplete_external_rejected 
__attribute__((loader_uninitialized));
+// expected-error@-1 {{variable 'incomplete_external_rejected' cannot be 
declared both 'extern' and with the 'loader_uninitialized' attribute}}
+
 int noargs __attribute__((loader_uninitialized(0)));
 // expected-error@-1 {{'loader_uninitialized' attribute takes no arguments}}
 
@@ -35,3 +39,8 @@ __private_extern__ int initialized_private_extern_rejected 
__attribute__((loader
 
 extern __attribute__((visibility("hidden"))) int extern_hidden 
__attribute__((loader_uninitialized));
 // expected-error@-1 {{variable 'extern_hidden' cannot be declared both 
'extern' and with the 'loader_uninitialized' attribute}}
+
+struct Incomplete;
+struct Incomplete incomplete __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'struct Incomplete'}}
+// expected-note@-3 {{forward declaration of 'struct Incomplete'}}

diff  --git a/clang/test/Sema/attr-loader-uninitialized.cpp 
b/clang/test/Sema/attr-loader-uninitialized.cpp
index 3a330b3d5965..5

[clang] 4b2e7d0 - [amdgpu] Default to code object v3

2020-12-14 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-12-15T01:11:09Z
New Revision: 4b2e7d0215021d0d1df1a6319884b21d33936265

URL: 
https://github.com/llvm/llvm-project/commit/4b2e7d0215021d0d1df1a6319884b21d33936265
DIFF: 
https://github.com/llvm/llvm-project/commit/4b2e7d0215021d0d1df1a6319884b21d33936265.diff

LOG: [amdgpu] Default to code object v3

[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end

Reviewed By: t-tye

Differential Revision: https://reviews.llvm.org/D93258

Added: 


Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/CommonArgs.cpp
llvm/docs/AMDGPUUsage.rst

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 67d41c3711f5..87c786065fa9 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2811,7 +2811,7 @@ def mexec_model_EQ : Joined<["-"], "mexec-model=">, 
Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 3. (AMDGPU only)">,
   MetaVarName<"">, Values<"2,3,4">;
 
 def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group,

diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 72bedc16846d..04d0e0771f70 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1549,7 +1549,7 @@ unsigned tools::getOrCheckAMDGPUCodeObjectVersion(
 const Driver &D, const llvm::opt::ArgList &Args, bool Diagnose) {
   const unsigned MinCodeObjVer = 2;
   const unsigned MaxCodeObjVer = 4;
-  unsigned CodeObjVer = 4;
+  unsigned CodeObjVer = 3;
 
   // Emit warnings for legacy options even if they are overridden.
   if (Diagnose) {

diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index e5d081a37500..95fb164310cc 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -911,12 +911,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V3`` is used to specify the version of AMD HSA
 runtime ABI for code object V3. Specify using the Clang option
-``-mcode-object-version=3``.
+``-mcode-object-version=3``. This is the default code object
+version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
 runtime ABI for code object V4. Specify using the Clang option
-``-mcode-object-version=4``. This is the default code object
-version if not specified.
+``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
 runtime ABI.
@@ -2871,10 +2871,6 @@ non-AMD key names should be prefixed by "*vendor-name*.".
 Code Object V3 Metadata
 +++
 
-.. warning::
-  Code object V3 is not the default code object version emitted by this version
-  of LLVM.
-
 Code object V3 to V4 metadata is specified by the ``NT_AMDGPU_METADATA`` note
 record (see :ref:`amdgpu-note-records-v3-v4`).
 
@@ -3279,6 +3275,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++
 
+.. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3`.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] c0619d3 - [NFC] Use regex for code object version in hip tests

2020-12-16 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-12-16T17:00:19Z
New Revision: c0619d3b21cd420b9faf15f14db0816787c44ded

URL: 
https://github.com/llvm/llvm-project/commit/c0619d3b21cd420b9faf15f14db0816787c44ded
DIFF: 
https://github.com/llvm/llvm-project/commit/c0619d3b21cd420b9faf15f14db0816787c44ded.diff

LOG: [NFC] Use regex for code object version in hip tests

[NFC] Use regex for code object version in hip tests

Extracted from D93258. Makes tests robust to changes in default
code object version.

Reviewed By: t-tye

Differential Revision: https://reviews.llvm.org/D93398

Added: 


Modified: 
clang/test/Driver/hip-autolink.hip
clang/test/Driver/hip-code-object-version.hip
clang/test/Driver/hip-device-compile.hip
clang/test/Driver/hip-host-cpu-features.hip
clang/test/Driver/hip-rdc-device-only.hip
clang/test/Driver/hip-target-id.hip
clang/test/Driver/hip-toolchain-mllvm.hip
clang/test/Driver/hip-toolchain-no-rdc.hip
clang/test/Driver/hip-toolchain-opt.hip
clang/test/Driver/hip-toolchain-rdc-separate.hip
clang/test/Driver/hip-toolchain-rdc-static-lib.hip
clang/test/Driver/hip-toolchain-rdc.hip

Removed: 




diff  --git a/clang/test/Driver/hip-autolink.hip 
b/clang/test/Driver/hip-autolink.hip
index 073c6c4d244a6..5f9311d7ba734 100644
--- a/clang/test/Driver/hip-autolink.hip
+++ b/clang/test/Driver/hip-autolink.hip
@@ -7,7 +7,7 @@
 // RUN: %clang --target=i386-pc-windows-msvc --cuda-gpu-arch=gfx906 -nogpulib \
 // RUN:   --cuda-host-only %s -### 2>&1 | FileCheck --check-prefix=HOST %s
 
-// DEV: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" 
"amdgcn-amd-amdhsa"
+// DEV: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" "-triple" 
"amdgcn-amd-amdhsa"
 // DEV-SAME: "-fno-autolink"
 
 // HOST: "-cc1" "-triple" "i386-pc-windows-msvc{{.*}}"

diff  --git a/clang/test/Driver/hip-code-object-version.hip 
b/clang/test/Driver/hip-code-object-version.hip
index 26ad6f8710cc2..51d9004b0cbf5 100644
--- a/clang/test/Driver/hip-code-object-version.hip
+++ b/clang/test/Driver/hip-code-object-version.hip
@@ -44,12 +44,17 @@
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=V4 %s
 
+// V4: "-mllvm" "--amdhsa-code-object-version=4"
+// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
+
+// Check bundle ID for code object version default
+
 // RUN: %clang -### -target x86_64-linux-gnu \
 // RUN:   --offload-arch=gfx906 -nogpulib \
-// RUN:   %s 2>&1 | FileCheck -check-prefix=V4 %s
+// RUN:   %s 2>&1 | FileCheck -check-prefix=VD %s
 
-// V4: "-mllvm" "--amdhsa-code-object-version=4"
-// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
+// VD: "-mllvm" "--amdhsa-code-object-version=4"
+// VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
 
 // Check invalid code object version option.
 

diff  --git a/clang/test/Driver/hip-device-compile.hip 
b/clang/test/Driver/hip-device-compile.hip
index 5fbcbc97bd805..c460ff7e8c67d 100644
--- a/clang/test/Driver/hip-device-compile.hip
+++ b/clang/test/Driver/hip-device-compile.hip
@@ -26,7 +26,7 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/a.cu \
 // RUN: 2>&1 | FileCheck -check-prefixes=CHECK,ASM %s
 
-// CHECK: {{".*clang.*"}} "-cc1" "-mllvm" "--amdhsa-code-object-version=4" 
"-triple" "amdgcn-amd-amdhsa"
+// CHECK: {{".*clang.*"}} "-cc1" "-mllvm" 
"--amdhsa-code-object-version={{[0-9]+}}" "-triple" "amdgcn-amd-amdhsa"
 // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // BC-SAME: "-emit-llvm-bc"
 // LL-SAME: "-emit-llvm"

diff  --git a/clang/test/Driver/hip-host-cpu-features.hip 
b/clang/test/Driver/hip-host-cpu-features.hip
index 235f0f1f22c24..8addfb11dc0b6 100644
--- a/clang/test/Driver/hip-host-cpu-features.hip
+++ b/clang/test/Driver/hip-host-cpu-features.hip
@@ -6,14 +6,14 @@
 // RUN: %clang -### -c -target x86_64-linux-gnu -msse3 --cuda-gpu-arch=gfx803 
-nogpulib %s 2>&1 | FileCheck %s -check-prefix=HOSTSSE3
 // RUN: %clang -### -c -target x86_64-linux-gnu --gpu-use-aux-triple-only 
-march=znver2 --cuda-gpu-arch=gfx803 -nogpulib %s 2>&1 | FileCheck %s 
-check-prefix=NOHOSTCPU
 
-// HOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" 
"amdgcn-amd-amdhsa"
+// HOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" 
"-triple" "amdgcn-amd-amdhsa"
 // HOSTCPU-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // HOSTCPU-SAME: "-aux-target-cpu" "znver2"
 
-// HOSTSSE3: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" 
"amdgcn-amd-amdhsa"
+// HOSTSSE3: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" 
"-triple" "amdgcn-amd-amdhsa"
 // HOSTSSE3-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // HOSTSSE3-SAME: "-aux-target-feature" "+sse3"
 
-// NOHOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" 
"amdgcn-amd-amdhsa"
+// NOHOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-versio

[clang] daf39e3 - [amdgpu] Default to code object v3

2020-12-17 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-12-17T16:09:33Z
New Revision: daf39e3f2dba18bd39cd89a1c91bae126a31d4fe

URL: 
https://github.com/llvm/llvm-project/commit/daf39e3f2dba18bd39cd89a1c91bae126a31d4fe
DIFF: 
https://github.com/llvm/llvm-project/commit/daf39e3f2dba18bd39cd89a1c91bae126a31d4fe.diff

LOG: [amdgpu] Default to code object v3

[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end

Reviewed By: t-tye, yaxunl

Differential Revision: https://reviews.llvm.org/D93258

Added: 


Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/CommonArgs.cpp
clang/test/Driver/hip-code-object-version.hip
llvm/docs/AMDGPUUsage.rst

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index f384e0d993c2..07f15add28ec 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2909,7 +2909,7 @@ def mexec_model_EQ : Joined<["-"], "mexec-model=">, 
Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 3. (AMDGPU only)">,
   MetaVarName<"">, Values<"2,3,4">;
 
 def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group,

diff  --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 72bedc16846d..04d0e0771f70 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1549,7 +1549,7 @@ unsigned tools::getOrCheckAMDGPUCodeObjectVersion(
 const Driver &D, const llvm::opt::ArgList &Args, bool Diagnose) {
   const unsigned MinCodeObjVer = 2;
   const unsigned MaxCodeObjVer = 4;
-  unsigned CodeObjVer = 4;
+  unsigned CodeObjVer = 3;
 
   // Emit warnings for legacy options even if they are overridden.
   if (Diagnose) {

diff  --git a/clang/test/Driver/hip-code-object-version.hip 
b/clang/test/Driver/hip-code-object-version.hip
index 51d9004b0cbf..6e4e96688593 100644
--- a/clang/test/Driver/hip-code-object-version.hip
+++ b/clang/test/Driver/hip-code-object-version.hip
@@ -53,7 +53,7 @@
 // RUN:   --offload-arch=gfx906 -nogpulib \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=VD %s
 
-// VD: "-mllvm" "--amdhsa-code-object-version=4"
+// VD: "-mllvm" "--amdhsa-code-object-version=3"
 // VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
 
 // Check invalid code object version option.

diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 6d3fa7021a7a..c8dda47352ab 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -911,12 +911,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V3`` is used to specify the version of AMD HSA
 runtime ABI for code object V3. Specify using the Clang option
-``-mcode-object-version=3``.
+``-mcode-object-version=3``. This is the default code object
+version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
 runtime ABI for code object V4. Specify using the Clang option
-``-mcode-object-version=4``. This is the default code object
-version if not specified.
+``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
 runtime ABI.
@@ -2871,10 +2871,6 @@ non-AMD key names should be prefixed by "*vendor-name*.".
 Code Object V3 Metadata
 +++
 
-.. warning::
-  Code object V3 is not the default code object version emitted by this version
-  of LLVM.
-
 Code object V3 to V4 metadata is specified by the ``NT_AMDGPU_METADATA`` note
 record (see :ref:`amdgpu-note-records-v3-v4`).
 
@@ -3279,6 +3275,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++
 
+.. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3`.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 5d02ca4 - [libomptarget][nvptx] Undef, weak shared variables

2020-10-28 Thread Jon Chesterfield via cfe-commits

Author: JonChesterfield
Date: 2020-10-28T14:25:36Z
New Revision: 5d02ca49a294848b533adf7dc1d1275d125ef587

URL: 
https://github.com/llvm/llvm-project/commit/5d02ca49a294848b533adf7dc1d1275d125ef587
DIFF: 
https://github.com/llvm/llvm-project/commit/5d02ca49a294848b533adf7dc1d1275d125ef587.diff

LOG: [libomptarget][nvptx] Undef, weak shared variables

[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/test/OpenMP/nvptx_data_sharing.cpp
clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
clang/test/OpenMP/nvptx_parallel_codegen.cpp
clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp

clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
clang/test/OpenMP/nvptx_teams_codegen.cpp
clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index bcabc5398127..08903a1444c2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -1102,7 +1102,7 @@ void CGOpenMPRuntimeGPU::emitNonSPMDKernel(const 
OMPExecutableDirective &D,
 KernelStaticGlobalized = new llvm::GlobalVariable(
 CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/false,
 llvm::GlobalValue::InternalLinkage,
-llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
+llvm::UndefValue::get(CGM.VoidPtrTy),
 "_openmp_kernel_static_glob_rd$ptr", /*InsertBefore=*/nullptr,
 llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
@@ -1234,7 +1234,7 @@ void CGOpenMPRuntimeGPU::emitSPMDKernel(const 
OMPExecutableDirective &D,
 KernelStaticGlobalized = new llvm::GlobalVariable(
 CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/false,
 llvm::GlobalValue::InternalLinkage,
-llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
+llvm::UndefValue::get(CGM.VoidPtrTy),
 "_openmp_kernel_static_glob_rd$ptr", /*InsertBefore=*/nullptr,
 llvm::GlobalValue::NotThreadLocal,
 CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
@@ -2855,8 +2855,8 @@ static llvm::Value 
*emitInterWarpCopyFunction(CodeGenModule &CGM,
 auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
 unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared);
 TransferMedium = new llvm::GlobalVariable(
-M, Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage,
-llvm::Constant::getNullValue(Ty), TransferMediumName,
+M, Ty, /*isConstant=*/false, llvm::GlobalVariable::WeakAnyLinkage,
+llvm::UndefValue::get(Ty), TransferMediumName,
 /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
 SharedAddressSpace);
 CGM.addCompilerUsedGlobal(TransferMedium);
@@ -4791,8 +4791,8 @@ void CGOpenMPRuntimeGPU::clear() {
   llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy);
   auto *GV = new llvm::GlobalVariable(
   CGM.getModule(), LLVMStaticTy,
-  /*isConstant=*/false, llvm::GlobalValue::CommonLinkage,
-  llvm::Constant::getNullValue(LLVMStaticTy),
+  /*isConstant=*/false, llvm::GlobalValue::WeakAnyLinkage,
+  llvm::UndefValue::get(LLVMStaticTy),
   "_openmp_shared_static_glob_rd_$_", /*InsertBefore=*/nullptr,
   llvm::GlobalValue::NotThreadLocal,
   C.getTargetAddressSpace(LangAS::cuda_shared));

diff  --git a/clang/test/OpenMP/nvptx_data_sharing.cpp 
b/clang/test/OpenMP/nvptx_data_sharing.cpp
index 1372246c7fc8..b6117d738d2b 100644
--- a/clang/test/OpenMP/nvptx_data_sharing.cpp
+++ b/clang/test/OpenMP/nvptx_data_sharing.cpp
@@ -28,8 +28,8 @@ void test_ds(){
   }
 }
 // SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
-// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] 
zeroinitializer
-// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
+// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = weak addrspace(3) global [[MEM_TY]] 
undef
+// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* undef
 // SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i64 8
 // SEQ-DAG: [[KERNE

[clang] dee7704 - [AMDGPU] Add __builtin_amdgcn_grid_size

2020-10-29 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-10-29T16:25:13Z
New Revision: dee7704829bd421ad3cce4b2132d28f4459b7319

URL: 
https://github.com/llvm/llvm-project/commit/dee7704829bd421ad3cce4b2132d28f4459b7319
DIFF: 
https://github.com/llvm/llvm-project/commit/dee7704829bd421ad3cce4b2132d28f4459b7319.diff

LOG: [AMDGPU] Add __builtin_amdgcn_grid_size

[AMDGPU] Add __builtin_amdgcn_grid_size

Similar to D76772, loads the data from the dispatch pointer. Marked invariant.

Patch also updates the openmp devicertl to use this builtin.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D90251

Added: 


Modified: 
clang/include/clang/Basic/BuiltinsAMDGPU.def
clang/lib/CodeGen/CGBuiltin.cpp
clang/test/CodeGenOpenCL/builtins-amdgcn.cl
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 042a86368559..f5901e6f8f3b 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -37,6 +37,10 @@ BUILTIN(__builtin_amdgcn_workgroup_size_x, "Us", "nc")
 BUILTIN(__builtin_amdgcn_workgroup_size_y, "Us", "nc")
 BUILTIN(__builtin_amdgcn_workgroup_size_z, "Us", "nc")
 
+BUILTIN(__builtin_amdgcn_grid_size_x, "Ui", "nc")
+BUILTIN(__builtin_amdgcn_grid_size_y, "Ui", "nc")
+BUILTIN(__builtin_amdgcn_grid_size_z, "Ui", "nc")
+
 BUILTIN(__builtin_amdgcn_mbcnt_hi, "UiUiUi", "nc")
 BUILTIN(__builtin_amdgcn_mbcnt_lo, "UiUiUi", "nc")
 

diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 6f7505b7b5c2..f933113fa883 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -14750,6 +14750,22 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, 
unsigned Index) {
   llvm::MDNode::get(CGF.getLLVMContext(), None));
   return LD;
 }
+
+// \p Index is 0, 1, and 2 for x, y, and z dimension, respectively.
+Value *EmitAMDGPUGridSize(CodeGenFunction &CGF, unsigned Index) {
+  const unsigned XOffset = 12;
+  auto *DP = EmitAMDGPUDispatchPtr(CGF);
+  // Indexing the HSA kernel_dispatch_packet struct.
+  auto *Offset = llvm::ConstantInt::get(CGF.Int32Ty, XOffset + Index * 4);
+  auto *GEP = CGF.Builder.CreateGEP(DP, Offset);
+  auto *DstTy =
+  CGF.Int32Ty->getPointerTo(GEP->getType()->getPointerAddressSpace());
+  auto *Cast = CGF.Builder.CreateBitCast(GEP, DstTy);
+  auto *LD = CGF.Builder.CreateLoad(Address(Cast, CharUnits::fromQuantity(4)));
+  LD->setMetadata(llvm::LLVMContext::MD_invariant_load,
+  llvm::MDNode::get(CGF.getLLVMContext(), None));
+  return LD;
+}
 } // namespace
 
 // For processing memory ordering and memory scope arguments of various
@@ -15010,6 +15026,14 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_workgroup_size_z:
 return EmitAMDGPUWorkGroupSize(*this, 2);
 
+  // amdgcn grid size
+  case AMDGPU::BI__builtin_amdgcn_grid_size_x:
+return EmitAMDGPUGridSize(*this, 0);
+  case AMDGPU::BI__builtin_amdgcn_grid_size_y:
+return EmitAMDGPUGridSize(*this, 1);
+  case AMDGPU::BI__builtin_amdgcn_grid_size_z:
+return EmitAMDGPUGridSize(*this, 2);
+
   // r600 intrinsics
   case AMDGPU::BI__builtin_r600_recipsqrt_ieee:
   case AMDGPU::BI__builtin_r600_recipsqrt_ieeef:

diff  --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 56c83df6b6b4..20edaf2aae3f 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -559,6 +559,24 @@ void test_get_workgroup_size(int d, global int *out)
}
 }
 
+// CHECK-LABEL: @test_get_grid_size(
+// CHECK: call align 4 dereferenceable(64) i8 addrspace(4)* 
@llvm.amdgcn.dispatch.ptr()
+// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 12
+// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
+// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 16
+// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
+// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 20
+// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load
+void test_get_grid_size(int d, global int *out)
+{
+   switch (d) {
+   case 0: *out = __builtin_amdgcn_grid_size_x(); break;
+   case 1: *out = __builtin_amdgcn_grid_size_y(); break;
+   case 2: *out = __builtin_amdgcn_grid_size_z(); break;
+   default: *out = 0;
+   }
+}
+
 // CHECK-LABEL: @test_fmed3_f32
 // CHECK: call float @llvm.amdgcn.fmed3.f32(
 void test_fmed3_f32(global float* out, float a, float b, float c)

diff  --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip 
b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
index 8c53d99b9fb6..9fbdc67b56ab 100644
--- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.

[clang] 5dfdc18 - [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.

2021-06-23 Thread Jon Chesterfield via cfe-commits

Author: Ethan Stewart
Date: 2021-06-23T15:26:09+01:00
New Revision: 5dfdc1812d9b9c043204d39318f6446424d8f2d7

URL: 
https://github.com/llvm/llvm-project/commit/5dfdc1812d9b9c043204d39318f6446424d8f2d7
DIFF: 
https://github.com/llvm/llvm-project/commit/5dfdc1812d9b9c043204d39318f6446424d8f2d7.diff

LOG: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.

This fixes issues with various return types(bool/int) and was already
in place for nvptx headers, adjusted to work for amdgcn. This does
not affect hip as the change is guarded with OPENMP_AMDGCN.
Similar to D85879.

Reviewed By: jdoerfert, JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D104677

Added: 


Modified: 
clang/lib/Headers/__clang_hip_cmath.h
clang/test/Headers/hip-header.hip
clang/test/Headers/openmp_device_math_isnan.cpp

Removed: 




diff  --git a/clang/lib/Headers/__clang_hip_cmath.h 
b/clang/lib/Headers/__clang_hip_cmath.h
index b5d7c16ac5e41..7342705434e6b 100644
--- a/clang/lib/Headers/__clang_hip_cmath.h
+++ b/clang/lib/Headers/__clang_hip_cmath.h
@@ -52,8 +52,46 @@ __DEVICE__ int fpclassify(double __x) {
 __DEVICE__ float frexp(float __arg, int *__exp) {
   return ::frexpf(__arg, __exp);
 }
+
+#if defined(__OPENMP_AMDGCN__)
+// For OpenMP we work around some old system headers that have non-conforming
+// `isinf(float)` and `isnan(float)` implementations that return an `int`. We 
do
+// this by providing two versions of these functions, 
diff ering only in the
+// return type. To avoid conflicting definitions we disable implicit base
+// function generation. That means we will end up with two specializations, one
+// per type, but only one has a base function defined by the system header.
+#pragma omp begin declare variant match(   
\
+implementation = {extension(disable_implicit_base)})
+
+// FIXME: We lack an extension to customize the mangling of the variants, e.g.,
+//add a suffix. This means we would clash with the names of the 
variants
+//(note that we do not create implicit base functions here). To avoid
+//this clash we add a new trait to some of them that is always true
+//(this is LLVM after all ;)). It will only influence the mangled name
+//of the variants inside the inner region and avoid the clash.
+#pragma omp begin declare variant match(implementation = {vendor(llvm)})
+
+__DEVICE__ int isinf(float __x) { return ::__isinff(__x); }
+__DEVICE__ int isinf(double __x) { return ::__isinf(__x); }
+__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); }
+__DEVICE__ int isfinite(double __x) { return ::__finite(__x); }
+__DEVICE__ int isnan(float __x) { return ::__isnanf(__x); }
+__DEVICE__ int isnan(double __x) { return ::__isnan(__x); }
+
+#pragma omp end declare variant
+#endif // defined(__OPENMP_AMDGCN__)
+
+__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }
+__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }
 __DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); }
 __DEVICE__ bool isfinite(double __x) { return ::__finite(__x); }
+__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); }
+__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); }
+
+#if defined(__OPENMP_AMDGCN__)
+#pragma omp end declare variant
+#endif // defined(__OPENMP_AMDGCN__)
+
 __DEVICE__ bool isgreater(float __x, float __y) {
   return __builtin_isgreater(__x, __y);
 }
@@ -66,8 +104,6 @@ __DEVICE__ bool isgreaterequal(float __x, float __y) {
 __DEVICE__ bool isgreaterequal(double __x, double __y) {
   return __builtin_isgreaterequal(__x, __y);
 }
-__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }
-__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }
 __DEVICE__ bool isless(float __x, float __y) {
   return __builtin_isless(__x, __y);
 }
@@ -86,8 +122,6 @@ __DEVICE__ bool islessgreater(float __x, float __y) {
 __DEVICE__ bool islessgreater(double __x, double __y) {
   return __builtin_islessgreater(__x, __y);
 }
-__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); }
-__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); }
 __DEVICE__ bool isnormal(float __x) { return __builtin_isnormal(__x); }
 __DEVICE__ bool isnormal(double __x) { return __builtin_isnormal(__x); }
 __DEVICE__ bool isunordered(float __x, float __y) {

diff  --git a/clang/test/Headers/hip-header.hip 
b/clang/test/Headers/hip-header.hip
index 5bba1de2ce6c4..0e95d58d55700 100644
--- a/clang/test/Headers/hip-header.hip
+++ b/clang/test/Headers/hip-header.hip
@@ -8,6 +8,20 @@
 // RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \
 // RUN:   -internal-isystem %S/../../lib/Headers/cuda_wrappers \
 // RUN:   -internal-isystem %S/Inputs/include \
+// RUN:   -include cmath \
+// RUN:   -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \
+// RUN:   -target-cpu gfx906 -emit-llvm %s -f

[clang] 7f97dda - Revert "[OpenMP][AMDGCN] Initial math headers support"

2021-07-30 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-07-30T22:07:00+01:00
New Revision: 7f97ddaf8aa0062393e866b63e68c9f74da375fb

URL: 
https://github.com/llvm/llvm-project/commit/7f97ddaf8aa0062393e866b63e68c9f74da375fb
DIFF: 
https://github.com/llvm/llvm-project/commit/7f97ddaf8aa0062393e866b63e68c9f74da375fb.diff

LOG: Revert "[OpenMP][AMDGCN] Initial math headers support"

Broke nvptx compilation on files including 

This reverts commit 12da97ea10a941f0123340831300d09a2121e173.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Headers/__clang_hip_cmath.h
clang/lib/Headers/__clang_hip_math.h
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
clang/lib/Headers/openmp_wrappers/cmath
clang/lib/Headers/openmp_wrappers/math.h
clang/test/Headers/Inputs/include/cstdlib
clang/test/Headers/openmp_device_math_isnan.cpp

Removed: 
clang/test/Headers/Inputs/include/algorithm
clang/test/Headers/Inputs/include/utility
clang/test/Headers/amdgcn_openmp_device_math.c



diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 278ae118563d6..e13302528cbd1 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1256,8 +1256,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
   // If we are offloading to a target via OpenMP we need to include the
   // openmp_wrappers folder which contains alternative system headers.
   if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
-  (getToolChain().getTriple().isNVPTX() ||
-   getToolChain().getTriple().isAMDGCN())) {
+  getToolChain().getTriple().isNVPTX()){
 if (!Args.hasArg(options::OPT_nobuiltininc)) {
   // Add openmp_wrappers/* to our system include path.  This lets us wrap
   // standard library headers.

diff  --git a/clang/lib/Headers/__clang_hip_cmath.h 
b/clang/lib/Headers/__clang_hip_cmath.h
index d488db0a94d9d..7342705434e6b 100644
--- a/clang/lib/Headers/__clang_hip_cmath.h
+++ b/clang/lib/Headers/__clang_hip_cmath.h
@@ -10,7 +10,7 @@
 #ifndef __CLANG_HIP_CMATH_H__
 #define __CLANG_HIP_CMATH_H__
 
-#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__)
+#if !defined(__HIP__)
 #error "This file is for HIP and OpenMP AMDGCN device compilation only."
 #endif
 
@@ -25,43 +25,31 @@
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
-#pragma push_macro("__CONSTEXPR__")
-#ifdef __OPENMP_AMDGCN__
-#define __DEVICE__ static __attribute__((always_inline, nothrow))
-#define __CONSTEXPR__ constexpr
-#else
 #define __DEVICE__ static __device__ inline __attribute__((always_inline))
-#define __CONSTEXPR__
-#endif // __OPENMP_AMDGCN__
 
 // Start with functions that cannot be defined by DEF macros below.
 #if defined(__cplusplus)
-#if defined __OPENMP_AMDGCN__
-__DEVICE__ __CONSTEXPR__ float fabs(float __x) { return ::fabsf(__x); }
-__DEVICE__ __CONSTEXPR__ float sin(float __x) { return ::sinf(__x); }
-__DEVICE__ __CONSTEXPR__ float cos(float __x) { return ::cosf(__x); }
-#endif
-__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); }
-__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); }
-__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); }
-__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); }
-__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) {
+__DEVICE__ double abs(double __x) { return ::fabs(__x); }
+__DEVICE__ float abs(float __x) { return ::fabsf(__x); }
+__DEVICE__ long long abs(long long __n) { return ::llabs(__n); }
+__DEVICE__ long abs(long __n) { return ::labs(__n); }
+__DEVICE__ float fma(float __x, float __y, float __z) {
   return ::fmaf(__x, __y, __z);
 }
 #if !defined(__HIPCC_RTC__)
 // The value returned by fpclassify is platform dependent, therefore it is not
 // supported by hipRTC.
-__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) {
+__DEVICE__ int fpclassify(float __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
-__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) {
+__DEVICE__ int fpclassify(double __x) {
   return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
   FP_ZERO, __x);
 }
 #endif // !defined(__HIPCC_RTC__)
 
-__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) {
+__DEVICE__ float frexp(float __arg, int *__exp) {
   return ::frexpf(__arg, __exp);
 }
 
@@ -83,101 +71,93 @@ __DEVICE__ __CONSTEXPR__ float frexp(float __arg, int 
*__exp) {
 //of the variants inside the inner region and avoid the clash.
 #pragma omp begin declare variant match(implementation = {vendor(llvm)})
 
-__DEVICE__ __CONSTEXPR__ int isinf(float __x) { return ::__isinff(__x); }
-__DEVICE__ __CONSTEXPR__ int isinf(double __x) { return ::__isinf(__x);

[clang] 509854b - [clang] Replace asm with __asm__ in cuda header

2021-08-05 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-05T18:46:57+01:00
New Revision: 509854b69cea0c9261ac21ceb22012a53e7a800b

URL: 
https://github.com/llvm/llvm-project/commit/509854b69cea0c9261ac21ceb22012a53e7a800b
DIFF: 
https://github.com/llvm/llvm-project/commit/509854b69cea0c9261ac21ceb22012a53e7a800b.diff

LOG: [clang] Replace asm with __asm__ in cuda header

Asm is a gnu extension for C, so at present -fopenmp -std=c99
and similar fail to compile on nvptx, bug 51344

Changing to `__asm__` or `__asm` works for openmp, all three appear to work
for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different
syntax, so this should make for better error diagnostics if the header is
passed to a compiler other than clang.

Reviewed By: tra, emankov

Differential Revision: https://reviews.llvm.org/D107492

Added: 


Modified: 
clang/lib/Headers/__clang_cuda_device_functions.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_cuda_device_functions.h 
b/clang/lib/Headers/__clang_cuda_device_functions.h
index f801e5426aa43..cc4e1a4dd96ad 100644
--- a/clang/lib/Headers/__clang_cuda_device_functions.h
+++ b/clang/lib/Headers/__clang_cuda_device_functions.h
@@ -34,10 +34,12 @@ __DEVICE__ unsigned long long __brevll(unsigned long long 
__a) {
   return __nv_brevll(__a);
 }
 #if defined(__cplusplus)
-__DEVICE__ void __brkpt() { asm volatile("brkpt;"); }
+__DEVICE__ void __brkpt() { __asm__ __volatile__("brkpt;"); }
 __DEVICE__ void __brkpt(int __a) { __brkpt(); }
 #else
-__DEVICE__ void __attribute__((overloadable)) __brkpt(void) { asm 
volatile("brkpt;"); }
+__DEVICE__ void __attribute__((overloadable)) __brkpt(void) {
+  __asm__ __volatile__("brkpt;");
+}
 __DEVICE__ void __attribute__((overloadable)) __brkpt(int __a) { __brkpt(); }
 #endif
 __DEVICE__ unsigned int __byte_perm(unsigned int __a, unsigned int __b,
@@ -507,7 +509,7 @@ __DEVICE__ float __powf(float __a, float __b) {
 }
 
 // Parameter must have a known integer value.
-#define __prof_trigger(__a) asm __volatile__("pmevent \t%0;" ::"i"(__a))
+#define __prof_trigger(__a) __asm__ __volatile__("pmevent \t%0;" ::"i"(__a))
 __DEVICE__ int __rhadd(int __a, int __b) { return __nv_rhadd(__a, __b); }
 __DEVICE__ unsigned int __sad(int __a, int __b, unsigned int __c) {
   return __nv_sad(__a, __b, __c);
@@ -526,7 +528,7 @@ __DEVICE__ float __tanf(float __a) { return 
__nv_fast_tanf(__a); }
 __DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); }
 __DEVICE__ void __threadfence_block(void) { __nvvm_membar_cta(); };
 __DEVICE__ void __threadfence_system(void) { __nvvm_membar_sys(); };
-__DEVICE__ void __trap(void) { asm volatile("trap;"); }
+__DEVICE__ void __trap(void) { __asm__ __volatile__("trap;"); }
 __DEVICE__ unsigned int __uAtomicAdd(unsigned int *__p, unsigned int __v) {
   return __nvvm_atom_add_gen_i((int *)__p, __v);
 }
@@ -1051,122 +1053,136 @@ __DEVICE__ unsigned int __bool2mask(unsigned int __a, 
int shift) {
 }
 __DEVICE__ unsigned int __vabs2(unsigned int __a) {
   unsigned int r;
-  asm("vabs
diff 2.s32.s32.s32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(0), "r"(0));
+  __asm__("vabs
diff 2.s32.s32.s32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(0), "r"(0));
   return r;
 }
 __DEVICE__ unsigned int __vabs4(unsigned int __a) {
   unsigned int r;
-  asm("vabs
diff 4.s32.s32.s32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(0), "r"(0));
+  __asm__("vabs
diff 4.s32.s32.s32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(0), "r"(0));
   return r;
 }
 __DEVICE__ unsigned int __vabs
diff s2(unsigned int __a, unsigned int __b) {
   unsigned int r;
-  asm("vabs
diff 2.s32.s32.s32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(__b), "r"(0));
+  __asm__("vabs
diff 2.s32.s32.s32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(__b), "r"(0));
   return r;
 }
 
 __DEVICE__ unsigned int __vabs
diff s4(unsigned int __a, unsigned int __b) {
   unsigned int r;
-  asm("vabs
diff 4.s32.s32.s32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(__b), "r"(0));
+  __asm__("vabs
diff 4.s32.s32.s32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(__b), "r"(0));
   return r;
 }
 __DEVICE__ unsigned int __vabs
diff u2(unsigned int __a, unsigned int __b) {
   unsigned int r;
-  asm("vabs
diff 2.u32.u32.u32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(__b), "r"(0));
+  __asm__("vabs
diff 2.u32.u32.u32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(__b), "r"(0));
   return r;
 }
 __DEVICE__ unsigned int __vabs
diff u4(unsigned int __a, unsigned int __b) {
   unsigned int r;
-  asm("vabs
diff 4.u32.u32.u32 %0,%1,%2,%3;"
-  : "=r"(r)
-  : "r"(__a), "r"(__b), "r"(0));
+  __asm__("vabs
diff 4.u32.u32.u32 %0,%1,%2,%3;"
+  : "=r"(r)
+  : "r"(__a), "r"(__b), "r"(0));
   return r;
 }
 __DEVICE__ unsigned int __vabsss2(unsigned int __a) {
   unsigned i

[clang] b611354 - [openmp] Annotate tmp variables with omp_thread_mem_alloc

2021-08-12 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-12T17:30:22+01:00
New Revision: b6113548c9217fb8a6d0e9ac5bef5584c1aa614d

URL: 
https://github.com/llvm/llvm-project/commit/b6113548c9217fb8a6d0e9ac5bef5584c1aa614d
DIFF: 
https://github.com/llvm/llvm-project/commit/b6113548c9217fb8a6d0e9ac5bef5584c1aa614d.diff

LOG: [openmp] Annotate tmp variables with omp_thread_mem_alloc

Fixes miscompile of calls into ocml. Bug 51445.

The stack variable `double __tmp` is moved to dynamically allocated shared
memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable
is passed to a function that is explicitly annotated address_space(5) then
allocating the variable off-stack leads to a miscompile in the back end,
which cannot decide to move the variable back to the stack from shared.

This could be fixed by removing the AS(5) annotation from the math library
or by explicitly marking the variables as thread_mem_alloc. The cast to
AS(5) is still a no-op once IR is reached.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D107971

Added: 


Modified: 
clang/lib/Headers/__clang_hip_math.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_hip_math.h 
b/clang/lib/Headers/__clang_hip_math.h
index 9effaa18d3e8..ef7e087b832c 100644
--- a/clang/lib/Headers/__clang_hip_math.h
+++ b/clang/lib/Headers/__clang_hip_math.h
@@ -19,6 +19,9 @@
 #endif
 #include 
 #include 
+#ifdef __OPENMP_AMDGCN__
+#include 
+#endif
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
@@ -258,6 +261,9 @@ float fmodf(float __x, float __y) { return 
__ocml_fmod_f32(__x, __y); }
 __DEVICE__
 float frexpf(float __x, int *__nptr) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r =
   __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -343,6 +349,9 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); 
}
 __DEVICE__
 float modff(float __x, float *__iptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r =
   __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp);
   *__iptr = __tmp;
@@ -423,6 +432,9 @@ float remainderf(float __x, float __y) {
 __DEVICE__
 float remquof(float __x, float __y, int *__quo) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r = __ocml_remquo_f32(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -479,6 +491,9 @@ __RETURN_TYPE __signbitf(float __x) { return 
__ocml_signbit_f32(__x); }
 __DEVICE__
 void sincosf(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr =
   __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float 
*)&__tmp);
   *__cosptr = __tmp;
@@ -487,6 +502,9 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) {
 __DEVICE__
 void sincospif(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr = __ocml_sincospi_f32(
   __x, (__attribute__((address_space(5))) float *)&__tmp);
   *__cosptr = __tmp;
@@ -799,6 +817,9 @@ double fmod(double __x, double __y) { return 
__ocml_fmod_f64(__x, __y); }
 __DEVICE__
 double frexp(double __x, int *__nptr) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r =
   __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -883,6 +904,9 @@ long int lround(double __x) { return __ocml_round_f64(__x); 
}
 __DEVICE__
 double modf(double __x, double *__iptr) {
   double __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r =
   __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp);
   *__iptr = __tmp;
@@ -971,6 +995,9 @@ double remainder(double __x, double __y) {
 __DEVICE__
 double remquo(double __x, double __y, int *__quo) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r = __ocml_remquo_f64(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -1029,6 +1056,9 @@ double sin(double __x) { return __ocml_sin_f64(__x); }
 __DEVICE__
 void sincos(double __x, double *__sinptr, double *__cosptr) {
   double __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr = __ocml_sincos_f64(
   __x, (__attribute__((address_space(5))) double *)&__tmp);
   *__cosptr = __tmp;
@@ -1037,6 +1067,9 @@ void sincos(double __x,

[clang] 6a8e512 - Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc"

2021-08-12 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-12T17:44:36+01:00
New Revision: 6a8e5120abacdfe0f05c9670782e59e2b729a318

URL: 
https://github.com/llvm/llvm-project/commit/6a8e5120abacdfe0f05c9670782e59e2b729a318
DIFF: 
https://github.com/llvm/llvm-project/commit/6a8e5120abacdfe0f05c9670782e59e2b729a318.diff

LOG: Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc"

This reverts commit b6113548c9217fb8a6d0e9ac5bef5584c1aa614d.

Added: 


Modified: 
clang/lib/Headers/__clang_hip_math.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_hip_math.h 
b/clang/lib/Headers/__clang_hip_math.h
index ef7e087b832c..9effaa18d3e8 100644
--- a/clang/lib/Headers/__clang_hip_math.h
+++ b/clang/lib/Headers/__clang_hip_math.h
@@ -19,9 +19,6 @@
 #endif
 #include 
 #include 
-#ifdef __OPENMP_AMDGCN__
-#include 
-#endif
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
@@ -261,9 +258,6 @@ float fmodf(float __x, float __y) { return 
__ocml_fmod_f32(__x, __y); }
 __DEVICE__
 float frexpf(float __x, int *__nptr) {
   int __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   float __r =
   __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -349,9 +343,6 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); 
}
 __DEVICE__
 float modff(float __x, float *__iptr) {
   float __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   float __r =
   __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp);
   *__iptr = __tmp;
@@ -432,9 +423,6 @@ float remainderf(float __x, float __y) {
 __DEVICE__
 float remquof(float __x, float __y, int *__quo) {
   int __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   float __r = __ocml_remquo_f32(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -491,9 +479,6 @@ __RETURN_TYPE __signbitf(float __x) { return 
__ocml_signbit_f32(__x); }
 __DEVICE__
 void sincosf(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   *__sinptr =
   __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float 
*)&__tmp);
   *__cosptr = __tmp;
@@ -502,9 +487,6 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) {
 __DEVICE__
 void sincospif(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   *__sinptr = __ocml_sincospi_f32(
   __x, (__attribute__((address_space(5))) float *)&__tmp);
   *__cosptr = __tmp;
@@ -817,9 +799,6 @@ double fmod(double __x, double __y) { return 
__ocml_fmod_f64(__x, __y); }
 __DEVICE__
 double frexp(double __x, int *__nptr) {
   int __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   double __r =
   __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -904,9 +883,6 @@ long int lround(double __x) { return __ocml_round_f64(__x); 
}
 __DEVICE__
 double modf(double __x, double *__iptr) {
   double __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   double __r =
   __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp);
   *__iptr = __tmp;
@@ -995,9 +971,6 @@ double remainder(double __x, double __y) {
 __DEVICE__
 double remquo(double __x, double __y, int *__quo) {
   int __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   double __r = __ocml_remquo_f64(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -1056,9 +1029,6 @@ double sin(double __x) { return __ocml_sin_f64(__x); }
 __DEVICE__
 void sincos(double __x, double *__sinptr, double *__cosptr) {
   double __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   *__sinptr = __ocml_sincos_f64(
   __x, (__attribute__((address_space(5))) double *)&__tmp);
   *__cosptr = __tmp;
@@ -1067,9 +1037,6 @@ void sincos(double __x, double *__sinptr, double 
*__cosptr) {
 __DEVICE__
 void sincospi(double __x, double *__sinptr, double *__cosptr) {
   double __tmp;
-#ifdef __OPENMP_AMDGCN__
-#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
-#endif
   *__sinptr = __ocml_sincospi_f64(
   __x, (__attribute__((address_space(5))) double *)&__tmp);
   *__cosptr = __tmp;



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 21d91a8 - [libomptarget][devicertl] Replace lanemask with uint64 at interface

2021-08-18 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-18T20:47:33+01:00
New Revision: 21d91a8ef319eec9c2c272e19beee726429524aa

URL: 
https://github.com/llvm/llvm-project/commit/21d91a8ef319eec9c2c272e19beee726429524aa
DIFF: 
https://github.com/llvm/llvm-project/commit/21d91a8ef319eec9c2c272e19beee726429524aa.diff

LOG: [libomptarget][devicertl] Replace lanemask with uint64 at interface

Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317

Added: 


Modified: 
clang/test/OpenMP/nvptx_parallel_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
llvm/test/Transforms/OpenMP/add_attributes.ll
openmp/libomptarget/DeviceRTL/include/Interface.h
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
openmp/libomptarget/deviceRTLs/common/src/sync.cu
openmp/libomptarget/deviceRTLs/interface.h

Removed: 




diff  --git a/clang/test/OpenMP/nvptx_parallel_codegen.cpp 
b/clang/test/OpenMP/nvptx_parallel_codegen.cpp
index 7cb86b80e158f..712c5a41c573d 100644
--- a/clang/test/OpenMP/nvptx_parallel_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_parallel_codegen.cpp
@@ -485,7 +485,7 @@ int bar(int n){
 // CHECK3-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], 
align 4
 // CHECK3-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4
 // CHECK3-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4
-// CHECK3-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask()
+// CHECK3-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask()
 // CHECK3-NEXT:[[NVPTX_TID:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
 // CHECK3-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
 // CHECK3-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4
@@ -508,7 +508,7 @@ int bar(int n){
 // CHECK3-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], 
i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var")
 // CHECK3-NEXT:br label [[OMP_CRITICAL_SYNC]]
 // CHECK3:   omp.critical.sync:
-// CHECK3-NEXT:call void @__kmpc_syncwarp(i32 [[TMP1]])
+// CHECK3-NEXT:call void @__kmpc_syncwarp(i64 [[TMP1]])
 // CHECK3-NEXT:[[TMP9:%.*]] = add nsw i32 [[TMP4]], 1
 // CHECK3-NEXT:store i32 [[TMP9]], i32* [[CRITICAL_COUNTER]], align 4
 // CHECK3-NEXT:br label [[OMP_CRITICAL_LOOP]]
@@ -938,7 +938,7 @@ int bar(int n){
 // CHECK4-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], 
align 4
 // CHECK4-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4
 // CHECK4-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4
-// CHECK4-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask()
+// CHECK4-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask()
 // CHECK4-NEXT:[[NVPTX_TID:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
 // CHECK4-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
 // CHECK4-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4
@@ -961,7 +961,7 @@ int bar(int n){
 // CHECK4-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], 
i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var")
 // CHECK4-NEXT:br label [[OMP_CRITICAL_SYNC]]
 // CHECK4:   omp.critical.sync:
-// CHECK4-NEXT:call void @__kmpc_syncwarp(i32 [[TMP1]])
+// CHECK4-NEXT:call void @__kmpc_syncwarp(i64 [[TMP1]])
 // CHECK4-NEXT:[[TMP9:%.*]] = add nsw i32 [[TMP4]], 1
 // CHECK4-NEXT:store i32 [[TMP9]], i32* [[CRITICAL_COUNTER]], align 4
 // CHECK4-NEXT:br label [[OMP_CRITICAL_LOOP]]
@@ -1391,7 +1391,7 @@ int bar(int n){
 // CHECK5-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], 
align 4
 // CHECK5-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4
 // CHECK5-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4
-// CHECK5-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask()
+// CHECK5-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask()
 // CHECK5-NEXT:[[NVPTX_TID:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.tid.x()
 // CHECK5-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 
@llvm.nvvm.read.ptx.sreg.ntid.x()
 // CHECK5-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4
@@ -1414,7 +1414,7 @@ int bar(int n){
 // CHECK5-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], 
i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var")
 // CHECK5-NEXT:br label [[OMP_CRITICAL_SYNC]]
 // 

[clang] dbd7bad - [openmp] Annotate tmp variables with omp_thread_mem_alloc

2021-08-18 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-19T02:22:11+01:00
New Revision: dbd7bad9ad9bc32538e324417c23387bf4ac7747

URL: 
https://github.com/llvm/llvm-project/commit/dbd7bad9ad9bc32538e324417c23387bf4ac7747
DIFF: 
https://github.com/llvm/llvm-project/commit/dbd7bad9ad9bc32538e324417c23387bf4ac7747.diff

LOG: [openmp] Annotate tmp variables with omp_thread_mem_alloc

Fixes miscompile of calls into ocml. Bug 51445.

The stack variable `double __tmp` is moved to dynamically allocated shared
memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable
is passed to a function that is explicitly annotated address_space(5) then
allocating the variable off-stack leads to a miscompile in the back end,
which cannot decide to move the variable back to the stack from shared.

This could be fixed by removing the AS(5) annotation from the math library
or by explicitly marking the variables as thread_mem_alloc. The cast to
AS(5) is still a no-op once IR is reached.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D107971

Added: 
clang/test/Headers/Inputs/include/omp.h

Modified: 
clang/lib/Headers/__clang_hip_math.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_hip_math.h 
b/clang/lib/Headers/__clang_hip_math.h
index 9effaa18d3e8c..ef7e087b832ca 100644
--- a/clang/lib/Headers/__clang_hip_math.h
+++ b/clang/lib/Headers/__clang_hip_math.h
@@ -19,6 +19,9 @@
 #endif
 #include 
 #include 
+#ifdef __OPENMP_AMDGCN__
+#include 
+#endif
 #endif // !defined(__HIPCC_RTC__)
 
 #pragma push_macro("__DEVICE__")
@@ -258,6 +261,9 @@ float fmodf(float __x, float __y) { return 
__ocml_fmod_f32(__x, __y); }
 __DEVICE__
 float frexpf(float __x, int *__nptr) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r =
   __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -343,6 +349,9 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); 
}
 __DEVICE__
 float modff(float __x, float *__iptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r =
   __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp);
   *__iptr = __tmp;
@@ -423,6 +432,9 @@ float remainderf(float __x, float __y) {
 __DEVICE__
 float remquof(float __x, float __y, int *__quo) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   float __r = __ocml_remquo_f32(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -479,6 +491,9 @@ __RETURN_TYPE __signbitf(float __x) { return 
__ocml_signbit_f32(__x); }
 __DEVICE__
 void sincosf(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr =
   __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float 
*)&__tmp);
   *__cosptr = __tmp;
@@ -487,6 +502,9 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) {
 __DEVICE__
 void sincospif(float __x, float *__sinptr, float *__cosptr) {
   float __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr = __ocml_sincospi_f32(
   __x, (__attribute__((address_space(5))) float *)&__tmp);
   *__cosptr = __tmp;
@@ -799,6 +817,9 @@ double fmod(double __x, double __y) { return 
__ocml_fmod_f64(__x, __y); }
 __DEVICE__
 double frexp(double __x, int *__nptr) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r =
   __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp);
   *__nptr = __tmp;
@@ -883,6 +904,9 @@ long int lround(double __x) { return __ocml_round_f64(__x); 
}
 __DEVICE__
 double modf(double __x, double *__iptr) {
   double __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r =
   __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp);
   *__iptr = __tmp;
@@ -971,6 +995,9 @@ double remainder(double __x, double __y) {
 __DEVICE__
 double remquo(double __x, double __y, int *__quo) {
   int __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   double __r = __ocml_remquo_f64(
   __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
   *__quo = __tmp;
@@ -1029,6 +1056,9 @@ double sin(double __x) { return __ocml_sin_f64(__x); }
 __DEVICE__
 void sincos(double __x, double *__sinptr, double *__cosptr) {
   double __tmp;
+#ifdef __OPENMP_AMDGCN__
+#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc)
+#endif
   *__sinptr = __ocml_sincos_f64(
   __x, (__attribute__((address_space(5))) double *)&__tmp);
   *__cosptr = __tmp;
@@ -

[clang] 77579b9 - [openmp][nfc] Replace OMPGridValues array with struct

2021-08-19 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-08-19T13:25:42+01:00
New Revision: 77579b99e9ce1638ca696fa7c3872ae8668d997d

URL: 
https://github.com/llvm/llvm-project/commit/77579b99e9ce1638ca696fa7c3872ae8668d997d
DIFF: 
https://github.com/llvm/llvm-project/commit/77579b99e9ce1638ca696fa7c3872ae8668d997d.diff

LOG: [openmp][nfc] Replace OMPGridValues array with struct

[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339

Added: 


Modified: 
clang/include/clang/Basic/TargetInfo.h
clang/lib/Basic/Targets/AMDGPU.cpp
clang/lib/Basic/Targets/NVPTX.cpp
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

Removed: 




diff  --git a/clang/include/clang/Basic/TargetInfo.h 
b/clang/include/clang/Basic/TargetInfo.h
index 21289b0dfd04..ab855948b447 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -210,8 +210,8 @@ class TargetInfo : public virtual TransferrableTargetInfo,
   unsigned char RegParmMax, SSERegParmMax;
   TargetCXXABI TheCXXABI;
   const LangASMap *AddrSpaceMap;
-  const unsigned *GridValues =
-  nullptr; // Array of target-specific GPU grid values that must be
+  const llvm::omp::GV *GridValues =
+  nullptr; // target-specific GPU grid values that must be
// consistent between host RTL (plugin), device RTL, and clang.
 
   mutable StringRef PlatformName;
@@ -1410,10 +1410,10 @@ class TargetInfo : public virtual 
TransferrableTargetInfo,
 return LangAS::Default;
   }
 
-  /// Return a target-specific GPU grid value based on the GVIDX enum \p gv
-  unsigned getGridValue(llvm::omp::GVIDX gv) const {
+  /// Return a target-specific GPU grid values
+  const llvm::omp::GV &getGridValue() const {
 assert(GridValues != nullptr && "GridValues not initialized");
-return GridValues[gv];
+return *GridValues;
   }
 
   /// Retrieve the name of the platform as it is used in the

diff  --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index fac786dbcf9e..cebb19e7ccab 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -335,7 +335,7 @@ AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple 
&Triple,
   llvm::AMDGPU::getArchAttrR600(GPUKind)) {
   resetDataLayout(isAMDGCN(getTriple()) ? DataLayoutStringAMDGCN
 : DataLayoutStringR600);
-  GridValues = llvm::omp::AMDGPUGpuGridValues;
+  GridValues = &llvm::omp::AMDGPUGridValues;
 
   setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D ||
  !isAMDGCN(Triple));

diff  --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index 56f8a179db3c..d1a34e4a81c5 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -65,7 +65,7 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
   TLSSupported = false;
   VLASupported = false;
   AddrSpaceMap = &NVPTXAddrSpaceMap;
-  GridValues = llvm::omp::NVPTXGpuGridValues;
+  GridValues = &llvm::omp::NVPTXGridValues;
   UseAddrSpaceMapMangling = true;
 
   // Define available target features

diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
index 33d4ab838af1..cac5faaa8d0f 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/StmtVisitor.h"
 #include "clang/Basic/Cuda.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Frontend/OpenMP/OMPGridValues.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 
 using namespace clang;
@@ -35,7 +36,7 @@ CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule 
&CGM)
 llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {
   CGBuilderTy &Bld = CGF.Builder;
   // return constant compile-time target-specific warp size
-  unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
+  unsigned WarpSize = CGF.getTarget().getGridValue().GV_Warp_Size;
   return Bld.getInt32(WarpSize);
 }
 

diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 63fecedc6fb7..b13d55994ef6 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

[clang] 33427fd - [libomptarget] Build DeviceRTL for amdgpu

2021-10-27 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-10-28T00:41:45+01:00
New Revision: 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2

URL: 
https://github.com/llvm/llvm-project/commit/33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2
DIFF: 
https://github.com/llvm/llvm-project/commit/33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2.diff

LOG: [libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
openmp/libomptarget/DeviceRTL/CMakeLists.txt
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/delete_inf_refcount.c
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug49334.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/global_constructor.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 5400e2617729..b138000f8cf2 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
   std::string BitcodeSuffix;
   if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime,
  options::OPT_fno_openmp_target_new_runtime, false))
-BitcodeSuffix = "new-amdgcn-" + GPUArch;
+BitcodeSuffix = "new-amdgpu-" + GPUArch;
   else
 BitcodeSuffix = "amdgcn-" + GPUArch;
 

diff  --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt 
b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
index a4f9862fb09b..419c64d38116 100644
--- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -226,6 +226,5 @@ foreach(sm ${nvptx_sm_list})
 endforeach()
 
 foreach(mcpu ${amdgpu_mcpus})
-  # require D112227 or similar to enable the compilation for amdgpu
-  # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
+  compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
 endforeach()

diff  --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp 
b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
index 2b6f20fb1732..f7c61dc013cf 100644
--- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
@@ -20,9 +20,9 @@ using namespace _OMP;
 
 #pragma omp declare target
 
-extern uint32_t __omp_rtl_debug_kind;
+extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU
 
-// TOOD: We want to change the name as soon as the old runtime is gone.
+// TODO: We want to change the name as soon as the old runtime is gone.
 DeviceEnvironmentTy CONSTANT(omptarget_device_environment)
 __attribute__((used));
 

diff  --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp 
b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
index c77e766ae6ca..33e2194b25f3 100644
--- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
@@ -68,8 +68,23 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int 
Ordering) {
 ///{
 #pragma omp begin declare variant match(device = {arch(amdgcn)})
 
-uint32_t atomicInc(uint32_t *Address, uint32_t Val, int Ordering) {
-  return __builtin_amdgcn_atomic_inc32(Address, Val, Ordering, "");
+uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) {
+  // builtin_amdgcn_atomic_inc32 should expand to this switch when
+  // passed a runtime value, but does not do so yet. Workaround here.
+  switch (Ordering) {
+  default:
+__builtin_unreachable();
+  case __ATOMIC_RELAXED:
+return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, "");
+  case __ATOMIC_ACQUIRE:
+return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQUIRE, "");
+  case __AT

[clang] 6c7b203 - Revert "[libomptarget] Build DeviceRTL for amdgpu"

2021-10-27 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-10-28T01:01:53+01:00
New Revision: 6c7b203d1d7000269215ab5b3d329ab03dc85e42

URL: 
https://github.com/llvm/llvm-project/commit/6c7b203d1d7000269215ab5b3d329ab03dc85e42
DIFF: 
https://github.com/llvm/llvm-project/commit/6c7b203d1d7000269215ab5b3d329ab03dc85e42.diff

LOG: Revert "[libomptarget] Build DeviceRTL for amdgpu"
 - more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2.

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
openmp/libomptarget/DeviceRTL/CMakeLists.txt
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/delete_inf_refcount.c
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug49334.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/global_constructor.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b138000f8cf2..5400e2617729 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
   std::string BitcodeSuffix;
   if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime,
  options::OPT_fno_openmp_target_new_runtime, false))
-BitcodeSuffix = "new-amdgpu-" + GPUArch;
+BitcodeSuffix = "new-amdgcn-" + GPUArch;
   else
 BitcodeSuffix = "amdgcn-" + GPUArch;
 

diff  --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt 
b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
index 419c64d38116..a4f9862fb09b 100644
--- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -226,5 +226,6 @@ foreach(sm ${nvptx_sm_list})
 endforeach()
 
 foreach(mcpu ${amdgpu_mcpus})
-  compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
+  # require D112227 or similar to enable the compilation for amdgpu
+  # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
 endforeach()

diff  --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp 
b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
index f7c61dc013cf..2b6f20fb1732 100644
--- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
@@ -20,9 +20,9 @@ using namespace _OMP;
 
 #pragma omp declare target
 
-extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU
+extern uint32_t __omp_rtl_debug_kind;
 
-// TODO: We want to change the name as soon as the old runtime is gone.
+// TOOD: We want to change the name as soon as the old runtime is gone.
 DeviceEnvironmentTy CONSTANT(omptarget_device_environment)
 __attribute__((used));
 

diff  --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp 
b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
index 931dffcaa131..46e7701a4872 100644
--- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
@@ -68,23 +68,8 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int 
Ordering) {
 ///{
 #pragma omp begin declare variant match(device = {arch(amdgcn)})
 
-uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) {
-  // builtin_amdgcn_atomic_inc32 should expand to this switch when
-  // passed a runtime value, but does not do so yet. Workaround here.
-  switch (Ordering) {
-  default:
-__builtin_unreachable();
-  case __ATOMIC_RELAXED:
-return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, "");
-  case __ATOMIC_ACQUIRE:
-return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQUIRE, "");
-  case __ATOMIC_RELEASE:
-return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELEASE, "");
-  case __ATOMIC_ACQ_REL:
-return __builtin_amdgcn_atomic_inc32(A, 

[clang] 4d50803 - [libomptarget] Build DeviceRTL for amdgpu

2021-10-28 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-10-28T12:34:01+01:00
New Revision: 4d50803ce49ce6b57c4865361c9ba0ad7063b7be

URL: 
https://github.com/llvm/llvm-project/commit/4d50803ce49ce6b57c4865361c9ba0ad7063b7be
DIFF: 
https://github.com/llvm/llvm-project/commit/4d50803ce49ce6b57c4865361c9ba0ad7063b7be.diff

LOG: [libomptarget] Build DeviceRTL for amdgpu

Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
openmp/libomptarget/DeviceRTL/CMakeLists.txt
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/delete_inf_refcount.c
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug49334.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/global_constructor.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 5400e26177291..b138000f8cf29 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
   std::string BitcodeSuffix;
   if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime,
  options::OPT_fno_openmp_target_new_runtime, false))
-BitcodeSuffix = "new-amdgcn-" + GPUArch;
+BitcodeSuffix = "new-amdgpu-" + GPUArch;
   else
 BitcodeSuffix = "amdgcn-" + GPUArch;
 

diff  --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt 
b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
index a4f9862fb09b3..419c64d381168 100644
--- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -226,6 +226,5 @@ foreach(sm ${nvptx_sm_list})
 endforeach()
 
 foreach(mcpu ${amdgpu_mcpus})
-  # require D112227 or similar to enable the compilation for amdgpu
-  # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
+  compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa 
-D__AMDGCN__ -fvisibility=default -nogpulib)
 endforeach()

diff  --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp 
b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
index 2b6f20fb1732c..f7c61dc013cf1 100644
--- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp
@@ -20,9 +20,9 @@ using namespace _OMP;
 
 #pragma omp declare target
 
-extern uint32_t __omp_rtl_debug_kind;
+extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU
 
-// TOOD: We want to change the name as soon as the old runtime is gone.
+// TODO: We want to change the name as soon as the old runtime is gone.
 DeviceEnvironmentTy CONSTANT(omptarget_device_environment)
 __attribute__((used));
 

diff  --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp 
b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
index d09461a016200..931dffcaa131e 100644
--- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
@@ -68,8 +68,23 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int 
Ordering) {
 ///{
 #pragma omp begin declare variant match(device = {arch(amdgcn)})
 
-uint32_t atomicInc(uint32_t *Address, uint32_t Val, int Ordering) {
-  return __builtin_amdgcn_atomic_inc32(Address, Val, Ordering, "");
+uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) {
+  // builtin_amdgcn_atomic_inc32 should expand to this switch when
+  // passed a runtime value, but does not do so yet. Workaround here.
+  switch (Ordering) {
+  default:
+__builtin_unreachable();
+  case __ATOMIC_RELAXED:
+return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, "");
+  case __ATOMI

[clang] 2c37ae6 - [nfc] Refactor CGGPUBuiltin to help review D112680

2021-11-08 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-11-08T15:00:08Z
New Revision: 2c37ae6d14cf263724720f56fc34b4579a6e5c1c

URL: 
https://github.com/llvm/llvm-project/commit/2c37ae6d14cf263724720f56fc34b4579a6e5c1c
DIFF: 
https://github.com/llvm/llvm-project/commit/2c37ae6d14cf263724720f56fc34b4579a6e5c1c.diff

LOG: [nfc] Refactor CGGPUBuiltin to help review D112680

Added: 


Modified: 
clang/lib/CodeGen/CGGPUBuiltin.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp 
b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index afbebd070c05..43192c587e26 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -66,39 +66,22 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module 
&M) {
 //
 // Note that by the time this function runs, E's args have already undergone 
the
 // standard C vararg promotion (short -> int, float -> double, etc.).
-RValue
-CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E,
-   ReturnValueSlot ReturnValue) {
-  assert(getTarget().getTriple().isNVPTX());
-  assert(E->getBuiltinCallee() == Builtin::BIprintf);
-  assert(E->getNumArgs() >= 1); // printf always has at least one arg.
-
-  const llvm::DataLayout &DL = CGM.getDataLayout();
-  llvm::LLVMContext &Ctx = CGM.getLLVMContext();
-
-  CallArgList Args;
-  EmitCallArgs(Args,
-   E->getDirectCallee()->getType()->getAs(),
-   E->arguments(), E->getDirectCallee(),
-   /* ParamsToSkip = */ 0);
 
-  // We don't know how to emit non-scalar varargs.
-  if (llvm::any_of(llvm::drop_begin(Args), [&](const CallArg &A) {
-return !A.getRValue(*this).isScalar();
-  })) {
-CGM.ErrorUnsupported(E, "non-scalar arg to printf");
-return RValue::get(llvm::ConstantInt::get(IntTy, 0));
-  }
+namespace {
+llvm::Value *packArgsIntoNVPTXFormatBuffer(CodeGenFunction *CGF,
+   const CallArgList &Args) {
+  const llvm::DataLayout &DL = CGF->CGM.getDataLayout();
+  llvm::LLVMContext &Ctx = CGF->CGM.getLLVMContext();
+  CGBuilderTy &Builder = CGF->Builder;
 
   // Construct and fill the args buffer that we'll pass to vprintf.
-  llvm::Value *BufferPtr;
   if (Args.size() <= 1) {
 // If there are no args, pass a null pointer to vprintf.
-BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
+return llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
 llvm::SmallVector ArgTypes;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I)
-  ArgTypes.push_back(Args[I].getRValue(*this).getScalarVal()->getType());
+  ArgTypes.push_back(Args[I].getRValue(*CGF).getScalarVal()->getType());
 
 // Using llvm::StructType is correct only because printf doesn't accept
 // aggregates.  If we had to handle aggregates here, we'd have to manually
@@ -106,15 +89,40 @@ CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const 
CallExpr *E,
 // that the alignment of the llvm type was the same as the alignment of the
 // clang type.
 llvm::Type *AllocaTy = llvm::StructType::create(ArgTypes, "printf_args");
-llvm::Value *Alloca = CreateTempAlloca(AllocaTy);
+llvm::Value *Alloca = CGF->CreateTempAlloca(AllocaTy);
 
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
   llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1);
-  llvm::Value *Arg = Args[I].getRValue(*this).getScalarVal();
+  llvm::Value *Arg = Args[I].getRValue(*CGF).getScalarVal();
   Builder.CreateAlignedStore(Arg, P, DL.getPrefTypeAlign(Arg->getType()));
 }
-BufferPtr = Builder.CreatePointerCast(Alloca, 
llvm::Type::getInt8PtrTy(Ctx));
+return Builder.CreatePointerCast(Alloca, llvm::Type::getInt8PtrTy(Ctx));
   }
+}
+} // namespace
+
+RValue
+CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E,
+   ReturnValueSlot ReturnValue) {
+  assert(getTarget().getTriple().isNVPTX());
+  assert(E->getBuiltinCallee() == Builtin::BIprintf);
+  assert(E->getNumArgs() >= 1); // printf always has at least one arg.
+
+  CallArgList Args;
+  EmitCallArgs(Args,
+   E->getDirectCallee()->getType()->getAs(),
+   E->arguments(), E->getDirectCallee(),
+   /* ParamsToSkip = */ 0);
+
+  // We don't know how to emit non-scalar varargs.
+  if (llvm::any_of(llvm::drop_begin(Args), [&](const CallArg &A) {
+return !A.getRValue(*this).isScalar();
+  })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::get(llvm::ConstantInt::get(IntTy, 0));
+  }
+
+  llvm::Value *BufferPtr = packArgsIntoNVPTXFormatBuffer(this, Args);
 
   // Invoke vprintf and return.
   llvm::Function* VprintfFunc = GetVprintfDeclaration(CGM.getModule());



___
cfe-commits mai

[clang] db81d8f - [OpenMP] Lower printf to __llvm_omp_vprintf

2021-11-08 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-11-08T18:38:00Z
New Revision: db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7

URL: 
https://github.com/llvm/llvm-project/commit/db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7
DIFF: 
https://github.com/llvm/llvm-project/commit/db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7.diff

LOG: [OpenMP] Lower printf to __llvm_omp_vprintf

Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680

Added: 


Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CGGPUBuiltin.cpp
clang/lib/CodeGen/CodeGenFunction.h
openmp/libomptarget/DeviceRTL/include/Debug.h
openmp/libomptarget/DeviceRTL/src/Debug.cpp
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index fab21e5b588a5..18e429cf3efd2 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5106,11 +5106,16 @@ RValue CodeGenFunction::EmitBuiltinExpr(const 
GlobalDecl GD, unsigned BuiltinID,
 return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
   }
   case Builtin::BIprintf:
-if (getTarget().getTriple().isNVPTX())
-  return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
-if (getTarget().getTriple().getArch() == Triple::amdgcn &&
-getLangOpts().HIP)
-  return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue);
+if (getTarget().getTriple().isNVPTX() ||
+getTarget().getTriple().isAMDGCN()) {
+  if (getLangOpts().OpenMPIsDevice)
+return EmitOpenMPDevicePrintfCallExpr(E);
+  if (getTarget().getTriple().isNVPTX())
+return EmitNVPTXDevicePrintfCallExpr(E);
+  if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
+return EmitAMDGPUDevicePrintfCallExpr(E);
+}
+
 break;
   case Builtin::BI__builtin_canonicalize:
   case Builtin::BI__builtin_canonicalizef:

diff  --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp 
b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index 43192c587e262..fdd2fa18bb4a0 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -21,13 +21,14 @@
 using namespace clang;
 using namespace CodeGen;
 
-static llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
+namespace {
+llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
   llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
 llvm::Type::getInt8PtrTy(M.getContext())};
   llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
   llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false);
 
-  if (auto* F = M.getFunction("vprintf")) {
+  if (auto *F = M.getFunction("vprintf")) {
 // Our CUDA system header declares vprintf with the right signature, so
 // nobody else should have been able to declare vprintf with a bogus
 // signature.
@@ -41,6 +42,28 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module 
&M) {
   VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M);
 }
 
+llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) {
+  const char *Name = "__llvm_omp_vprintf";
+  llvm::Module &M = CGM.getModule();
+  llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
+llvm::Type::getInt8PtrTy(M.getContext()),
+llvm::Type::getInt32Ty(M.getContext())};
+  llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
+  llvm::Type::getIn

[clang] 0fa45d6 - Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"

2021-11-08 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-11-08T20:28:57Z
New Revision: 0fa45d6d8067d71a8dccac7d942c53b5fd80e499

URL: 
https://github.com/llvm/llvm-project/commit/0fa45d6d8067d71a8dccac7d942c53b5fd80e499
DIFF: 
https://github.com/llvm/llvm-project/commit/0fa45d6d8067d71a8dccac7d942c53b5fd80e499.diff

LOG: Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"

This reverts commit db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7.

Added: 


Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CGGPUBuiltin.cpp
clang/lib/CodeGen/CodeGenFunction.h
openmp/libomptarget/DeviceRTL/include/Debug.h
openmp/libomptarget/DeviceRTL/src/Debug.cpp
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 18e429cf3efd2..fab21e5b588a5 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5106,16 +5106,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const 
GlobalDecl GD, unsigned BuiltinID,
 return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
   }
   case Builtin::BIprintf:
-if (getTarget().getTriple().isNVPTX() ||
-getTarget().getTriple().isAMDGCN()) {
-  if (getLangOpts().OpenMPIsDevice)
-return EmitOpenMPDevicePrintfCallExpr(E);
-  if (getTarget().getTriple().isNVPTX())
-return EmitNVPTXDevicePrintfCallExpr(E);
-  if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
-return EmitAMDGPUDevicePrintfCallExpr(E);
-}
-
+if (getTarget().getTriple().isNVPTX())
+  return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
+if (getTarget().getTriple().getArch() == Triple::amdgcn &&
+getLangOpts().HIP)
+  return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue);
 break;
   case Builtin::BI__builtin_canonicalize:
   case Builtin::BI__builtin_canonicalizef:

diff  --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp 
b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index fdd2fa18bb4a0..43192c587e262 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -21,14 +21,13 @@
 using namespace clang;
 using namespace CodeGen;
 
-namespace {
-llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
+static llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
   llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
 llvm::Type::getInt8PtrTy(M.getContext())};
   llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
   llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false);
 
-  if (auto *F = M.getFunction("vprintf")) {
+  if (auto* F = M.getFunction("vprintf")) {
 // Our CUDA system header declares vprintf with the right signature, so
 // nobody else should have been able to declare vprintf with a bogus
 // signature.
@@ -42,28 +41,6 @@ llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
   VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M);
 }
 
-llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) {
-  const char *Name = "__llvm_omp_vprintf";
-  llvm::Module &M = CGM.getModule();
-  llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
-llvm::Type::getInt8PtrTy(M.getContext()),
-llvm::Type::getInt32Ty(M.getContext())};
-  llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
-  llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false);
-
-  if (auto *F = M.getFunction(Name)) {
-if (F->getFunctionType() != VprintfFuncType) {
-  CGM.Error(SourceLocation(),
-"Invalid type declaration for __llvm_omp_vprintf");
-  return nullptr;
-}
-return F;
-  }
-
-  return llvm::Function::Create(
-  VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, Name, &M);
-}
-
 // Transforms a call to printf into a call to the NVPTX vprintf syscall (which
 // isn't particularly

[clang] 27177b8 - [OpenMP] Lower printf to __llvm_omp_vprintf

2021-11-10 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-11-10T15:30:56Z
New Revision: 27177b82d4ca4451f288168fc1e06c0736afbdaf

URL: 
https://github.com/llvm/llvm-project/commit/27177b82d4ca4451f288168fc1e06c0736afbdaf
DIFF: 
https://github.com/llvm/llvm-project/commit/27177b82d4ca4451f288168fc1e06c0736afbdaf.diff

LOG: [OpenMP] Lower printf to __llvm_omp_vprintf

Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680

Added: 


Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CGGPUBuiltin.cpp
clang/lib/CodeGen/CodeGenFunction.h
clang/test/OpenMP/nvptx_target_printf_codegen.c
openmp/libomptarget/DeviceRTL/include/Debug.h
openmp/libomptarget/DeviceRTL/src/Debug.cpp
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
openmp/libomptarget/test/mapping/data_member_ref.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp
openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp
openmp/libomptarget/test/mapping/lambda_by_value.cpp
openmp/libomptarget/test/mapping/ompx_hold/struct.c
openmp/libomptarget/test/mapping/ptr_and_obj_motion.c
openmp/libomptarget/test/mapping/reduction_implicit_map.cpp
openmp/libomptarget/test/offloading/bug49021.cpp
openmp/libomptarget/test/offloading/bug50022.cpp
openmp/libomptarget/test/offloading/host_as_target.c
openmp/libomptarget/test/unified_shared_memory/api.c
openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c
openmp/libomptarget/test/unified_shared_memory/close_modifier.c
openmp/libomptarget/test/unified_shared_memory/shared_update.c

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index fab21e5b588a5..18e429cf3efd2 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5106,11 +5106,16 @@ RValue CodeGenFunction::EmitBuiltinExpr(const 
GlobalDecl GD, unsigned BuiltinID,
 return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
   }
   case Builtin::BIprintf:
-if (getTarget().getTriple().isNVPTX())
-  return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
-if (getTarget().getTriple().getArch() == Triple::amdgcn &&
-getLangOpts().HIP)
-  return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue);
+if (getTarget().getTriple().isNVPTX() ||
+getTarget().getTriple().isAMDGCN()) {
+  if (getLangOpts().OpenMPIsDevice)
+return EmitOpenMPDevicePrintfCallExpr(E);
+  if (getTarget().getTriple().isNVPTX())
+return EmitNVPTXDevicePrintfCallExpr(E);
+  if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
+return EmitAMDGPUDevicePrintfCallExpr(E);
+}
+
 break;
   case Builtin::BI__builtin_canonicalize:
   case Builtin::BI__builtin_canonicalizef:

diff  --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp 
b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index 43192c587e262..fdd2fa18bb4a0 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -21,13 +21,14 @@
 using namespace clang;
 using namespace CodeGen;
 
-static llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
+namespace {
+llvm::Function *GetVprintfDeclaration(llvm::Module &M) {
   llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
 llvm::Type::getInt8PtrTy(M.getContext())};
   llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
   llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false);
 
-  if (auto* F = M.getFunction("vprintf")) {
+  if (auto *F = M.getFunction("vprintf")) {
 // Our CUDA system header declares vprintf with the right signature, so
 // nobody else should have been able to declare vprintf with a bogus
 // signature.
@@ -41,6 +42,28 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module 
&M) {
   VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M);
 }
 
+llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) {
+  const char *Name = "__llvm_omp_vprintf";
+  llvm::Module &M = CGM.getModule();
+  llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()),
+llvm::Type::getInt8PtrTy(M.getContext()),
+llvm::Type::getInt32Ty(M.getContext())};
+  llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get(
+  llvm::Type::getInt32Ty(M.getContext()), ArgTy

[clang] 0e73832 - [openmp][amdgpu] Add comment warning that libm may be broken

2021-11-15 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2021-11-15T15:56:01Z
New Revision: 0e738323a9c445e31b4e1b1dcb2beb19d6f103ef

URL: 
https://github.com/llvm/llvm-project/commit/0e738323a9c445e31b4e1b1dcb2beb19d6f103ef
DIFF: 
https://github.com/llvm/llvm-project/commit/0e738323a9c445e31b4e1b1dcb2beb19d6f103ef.diff

LOG: [openmp][amdgpu] Add comment warning that libm may be broken

Using llvm-link to add rocm device-libs probably doesn't work

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112639

Added: 


Modified: 
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b138000f8cf2..863e2c597d53 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -106,6 +106,22 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
 }
 
 if (HasLibm) {
+  // This is not certain to work. The device libs added here, and passed to
+  // llvm-link, are missing attributes that they expect to be inserted when
+  // passed to mlink-builtin-bitcode. The amdgpu backend does not generate
+  // conservatively correct code when attributes are missing, so this may
+  // be the root cause of miscompilations. Passing via 
mlink-builtin-bitcode
+  // ultimately hits CodeGenModule::addDefaultFunctionDefinitionAttributes
+  // on each function, see D28538 for context.
+  // Potential workarounds:
+  //  - unconditionally link all of the device libs to every translation
+  //unit in clang via mlink-builtin-bitcode
+  //  - build a libm bitcode file as part of the DeviceRTL and explictly
+  //mlink-builtin-bitcode the rocm device libs components at build time
+  //  - drop this llvm-link fork in favour or some calls into LLVM, chosen
+  //to do basically the same work as llvm-link but with that call first
+  //  - write an opt pass that sets that on every function it sees and pipe
+  //the device-libs bitcode through that on the way to this llvm-link
   SmallVector BCLibs =
   AMDGPUOpenMPTC.getCommonDeviceLibNames(Args, SubArchName.str());
   llvm::for_each(BCLibs, [&](StringRef BCFile) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] bfe4514 - [amdgpuarch] Delete stray hsa #include line

2023-01-25 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2023-01-25T21:40:35Z
New Revision: bfe4514add5b7ab7e1f06248983a7162d734cffb

URL: 
https://github.com/llvm/llvm-project/commit/bfe4514add5b7ab7e1f06248983a7162d734cffb
DIFF: 
https://github.com/llvm/llvm-project/commit/bfe4514add5b7ab7e1f06248983a7162d734cffb.diff

LOG: [amdgpuarch] Delete stray hsa #include line

Added: 


Modified: 
clang/tools/amdgpu-arch/AMDGPUArch.cpp

Removed: 




diff  --git a/clang/tools/amdgpu-arch/AMDGPUArch.cpp 
b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
index 2fdd398c9c673..fbb084a2a1231 100644
--- a/clang/tools/amdgpu-arch/AMDGPUArch.cpp
+++ b/clang/tools/amdgpu-arch/AMDGPUArch.cpp
@@ -75,7 +75,6 @@ llvm::Error loadHSA() {
 #elif __has_include("hsa.h")
 #include "hsa.h"
 #endif
-#include "hsa/hsa.h"
 #endif
 
 llvm::Error loadHSA() { return llvm::Error::success(); }



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LinkerWrapper] Fix resolution of weak symbols during LTO (PR #68215)

2023-10-04 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.

LinkerWrapper turning into a linker is kind of inevitable and not a very happy 
thing.

One option would be to lean on lld for amdgpu and split out the nvptx stuff in 
the hopes that we eventually have an alternative to nvlink, but it seems 
moderately unlikely to come to pass and driving both amdgpu and nvptx down the 
same code path has merits.

https://github.com/llvm/llvm-project/pull/68215
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Prevent AMDGPU from overriding visibility on DT_nohost variables (PR #68264)

2023-10-05 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield commented:

This stuff looks very cuda/opencl specific. It's definitely surprising for C++ 
code. Do we need it for openmp? If not it seems better to guard the hack with 
visibility behind if (hip)

https://github.com/llvm/llvm-project/pull/68264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Prevent AMDGPU from overriding visibility on DT_nohost variables (PR #68264)

2023-10-05 Thread Jon Chesterfield via cfe-commits


@@ -308,12 +308,13 @@ static bool requiresAMDGPUProtectedVisibility(const Decl 
*D,
   if (GV->getVisibility() != llvm::GlobalValue::HiddenVisibility)
 return false;
 
-  return D->hasAttr() ||
- (isa(D) && D->hasAttr()) ||
- (isa(D) &&
-  (D->hasAttr() || D->hasAttr() ||
-   cast(D)->getType()->isCUDADeviceBuiltinSurfaceType() ||
-   cast(D)->getType()->isCUDADeviceBuiltinTextureType()));
+  return !D->hasAttr() &&

JonChesterfield wrote:

is this a spurious whitespace change?

https://github.com/llvm/llvm-project/pull/68264
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] c45eaea - [Clang] Undef attribute for global variables

2020-03-17 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-03-17T21:22:23Z
New Revision: c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a

URL: 
https://github.com/llvm/llvm-project/commit/c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a
DIFF: 
https://github.com/llvm/llvm-project/commit/c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a.diff

LOG: [Clang] Undef attribute for global variables

Summary:
[Clang] Attribute to allow defining undef global variables

Initializing global variables is very cheap on hosted implementations. The
C semantics of zero initializing globals work very well there. It is not
necessarily cheap on freestanding implementations. Where there is no loader
available, code must be emitted near the start point to write the appropriate
values into memory.

At present, external variables can be declared in C++ and definitions provided
in assembly (or IR) to achive this effect. This patch provides an attribute in
order to remove this reason for writing assembly for performance sensitive
freestanding implementations.

A close analogue in tree is LDS memory for amdgcn, where the kernel is
responsible for initializing the memory after it starts executing on the gpu.
Uninitalized variables in LDS are observably cheaper than zero initialized.

Patch is loosely based on the cuda __shared__ and opencl __local variable
implementation which also produces undef global variables.

Reviewers: kcc, rjmccall, rsmith, glider, vitalybuka, pcc, eugenis, 
vlad.tsyrklevich, jdoerfert, gregrodgers, jfb, aaron.ballman

Reviewed By: rjmccall, aaron.ballman

Subscribers: Anastasia, aaron.ballman, davidb, Quuxplusone, dexonsmith, 
cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74361

Added: 
clang/test/CodeGen/attr-loader-uninitialized.c
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
clang/test/Sema/attr-loader-uninitialized.c
clang/test/Sema/attr-loader-uninitialized.cpp

Modified: 
clang/include/clang/Basic/Attr.td
clang/include/clang/Basic/AttrDocs.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/AST/DeclBase.cpp
clang/lib/CodeGen/CGDecl.cpp
clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/Sema/SemaDecl.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/test/Misc/pragma-attribute-supported-attributes-list.test

Removed: 




diff  --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index 624995a2d572..a0d521d17d0f 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -3313,6 +3313,12 @@ def Uninitialized : InheritableAttr {
   let Documentation = [UninitializedDocs];
 }
 
+def LoaderUninitialized : Attr {
+  let Spellings = [Clang<"loader_uninitialized">];
+  let Subjects = SubjectList<[GlobalVar]>;
+  let Documentation = [LoaderUninitializedDocs];
+}
+
 def ObjCExternallyRetained : InheritableAttr {
   let LangOpts = [ObjCAutoRefCount];
   let Spellings = [Clang<"objc_externally_retained">];

diff  --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index aea574995c8e..60496694200e 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -4358,6 +4358,29 @@ it rather documents the programmer's intent.
   }];
 }
 
+def LoaderUninitializedDocs : Documentation {
+  let Category = DocCatVariable;
+  let Content = [{
+The ``loader_uninitialized`` attribute can be placed on global variables to
+indicate that the variable does not need to be zero initialized by the loader.
+On most targets, zero-initialization does not incur any additional cost.
+For example, most general purpose operating systems deliberately ensure
+that all memory is properly initialized in order to avoid leaking privileged
+information from the kernel or other programs. However, some targets
+do not make this guarantee, and on these targets, avoiding an unnecessary
+zero-initialization can have a significant impact on load times and/or code
+size.
+
+A declaration with this attribute is a non-tentative definition just as if it
+provided an initializer. Variables with this attribute are considered to be
+uninitialized in the same sense as a local variable, and the programs must
+write to them before reading from them. If the variable's type is a C++ class
+type with a non-trivial default constructor, or an array thereof, this 
attribute
+only suppresses the static zero-initialization of the variable, not the dynamic
+initialization provided by executing the default constructor.
+  }];
+}
+
 def CallbackDocs : Documentation {
   let Category = DocCatFunction;
   let Content = [{

diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 7cb1eae9615b..f777e0ae4c81 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -5333,6 +5333,17 @@ def ext_aggregate_init_no

[clang] 1d19b15 - Fix arm build broken by D74361 by dropping align from filecheck pattern

2020-03-17 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-03-17T22:15:19Z
New Revision: 1d19b153955a87bd0f83c8a6a072d69239f76d63

URL: 
https://github.com/llvm/llvm-project/commit/1d19b153955a87bd0f83c8a6a072d69239f76d63
DIFF: 
https://github.com/llvm/llvm-project/commit/1d19b153955a87bd0f83c8a6a072d69239f76d63.diff

LOG: Fix arm build broken by D74361 by dropping align from filecheck pattern

Added: 


Modified: 
clang/test/CodeGen/attr-loader-uninitialized.c
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp

Removed: 




diff  --git a/clang/test/CodeGen/attr-loader-uninitialized.c 
b/clang/test/CodeGen/attr-loader-uninitialized.c
index c653d5ba3991..a7fa550fc26d 100644
--- a/clang/test/CodeGen/attr-loader-uninitialized.c
+++ b/clang/test/CodeGen/attr-loader-uninitialized.c
@@ -1,14 +1,14 @@
 // RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s
 
-// CHECK: @tentative_attr_first = global i32 undef, align 4
+// CHECK: @tentative_attr_first = global i32 undef
 int tentative_attr_first __attribute__((loader_uninitialized));
 int tentative_attr_first;
 
-// CHECK: @tentative_attr_second = global i32 undef, align 4
+// CHECK: @tentative_attr_second = global i32 undef
 int tentative_attr_second;
 int tentative_attr_second __attribute__((loader_uninitialized));
 
-// CHECK: @array = global [16 x float] undef, align 16
+// CHECK: @array = global [16 x float] undef
 float array[16] __attribute__((loader_uninitialized));
 
 typedef struct
@@ -17,8 +17,8 @@ typedef struct
   float y;
 } s;
 
-// CHECK: @i = global %struct.s undef, align 4
+// CHECK: @i = global %struct.s undef
 s i __attribute__((loader_uninitialized));
 
-// CHECK: @private_extern_ok = hidden global i32 undef, align 4
+// CHECK: @private_extern_ok = hidden global i32 undef
 __private_extern__ int private_extern_ok __attribute__((loader_uninitialized));

diff  --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp 
b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
index ec9d8a54db78..3b401dcf4094 100644
--- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
+++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
@@ -21,9 +21,9 @@ class trivial
 // CHECK: @ut = global %class.trivial undef
 trivial ut [[clang::loader_uninitialized]];
 
-// CHECK: @arr = global [32 x double] undef, align 16
+// CHECK: @arr = global [32 x double] undef
 double arr[32] __attribute__((loader_uninitialized));
 
 // Defining as arr2[] [[clang..]] raises the error: attribute cannot be 
applied to types
-// CHECK: @arr2 = global [4 x double] undef, align 16
+// CHECK: @arr2 = global [4 x double] undef
 double arr2 [[clang::loader_uninitialized]] [4];



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] cc691f3 - Disable loader-uninitialized tests on Windows

2020-03-17 Thread Jon Chesterfield via cfe-commits

Author: Jon Chesterfield
Date: 2020-03-17T23:33:12Z
New Revision: cc691f3384c593849d3a5ab468d8e5ac6f707dab

URL: 
https://github.com/llvm/llvm-project/commit/cc691f3384c593849d3a5ab468d8e5ac6f707dab
DIFF: 
https://github.com/llvm/llvm-project/commit/cc691f3384c593849d3a5ab468d8e5ac6f707dab.diff

LOG: Disable loader-uninitialized tests on Windows

Added: 


Modified: 
clang/test/CodeGen/attr-loader-uninitialized.c
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp

Removed: 




diff  --git a/clang/test/CodeGen/attr-loader-uninitialized.c 
b/clang/test/CodeGen/attr-loader-uninitialized.c
index a7fa550fc26d..9ff0c23d77c3 100644
--- a/clang/test/CodeGen/attr-loader-uninitialized.c
+++ b/clang/test/CodeGen/attr-loader-uninitialized.c
@@ -1,3 +1,4 @@
+// UNSUPPORTED: system-windows
 // RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s
 
 // CHECK: @tentative_attr_first = global i32 undef

diff  --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp 
b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
index 3b401dcf4094..e82ae47e9f16 100644
--- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
+++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
@@ -1,3 +1,4 @@
+// UNSUPPORTED: system-windows
 // RUN: %clang_cc1 -emit-llvm -o - %s | FileCheck %s
 
 // CHECK: @defn = global i32 undef



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] clang-format: SpaceBeforeParens (Always) with overloaded operators

2015-08-11 Thread Jon Chesterfield via cfe-commits
Hi,

I believe this is being sent to the correct list. Please let me know if
there is a better choice.

The clang-format option SpaceBeforeParens "Always" does not insert a space
before the opening parenthesis of an overloaded operator function. The
attached patch against trunk resolves this.

Kind regards,

Jon Chesterfield
SN Systems - Sony Computer Entertainment Group.
Index: lib/Format/TokenAnnotator.cpp
===
--- lib/Format/TokenAnnotator.cpp   (revision 244436)
+++ lib/Format/TokenAnnotator.cpp   (working copy)
@@ -1997,7 +1997,7 @@
   if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen))
 return true;
   if (Right.is(TT_OverloadedOperatorLParen))
-return false;
+return Style.SpaceBeforeParens == FormatStyle::SBPO_Always;
   if (Right.is(tok::colon)) {
 if (Line.First->isOneOf(tok::kw_case, tok::kw_default) ||
 !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi))
Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp (revision 244436)
+++ unittests/Format/FormatTest.cpp (working copy)
@@ -8276,6 +8276,8 @@
   verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace);
   verifyFormat("int f() throw(Deprecated);", NoSpace);
   verifyFormat("typedef void (*cb)(int);", NoSpace);
+  verifyFormat("T A::operator()();",NoSpace);
+  verifyFormat("X A::operator++(T);",NoSpace);
 
   FormatStyle Space = getLLVMStyle();
   Space.SpaceBeforeParens = FormatStyle::SBPO_Always;
@@ -8321,6 +8323,8 @@
   verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space);
   verifyFormat("int f () throw (Deprecated);", Space);
   verifyFormat("typedef void (*cb) (int);", Space);
+  verifyFormat("T A::operator() ();",Space);
+  verifyFormat("X A::operator++ (T);",Space);
 }
 
 TEST_F(FormatTest, ConfigurableSpacesInParentheses) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D11957: SpaceBeforeParens (Always) with overloaded operators

2015-08-11 Thread Jon Chesterfield via cfe-commits
JonChesterfield created this revision.
JonChesterfield added a subscriber: cfe-commits.
JonChesterfield set the repository for this revision to rL LLVM.
Herald added a subscriber: klimek.

The clang-format option SpaceBeforeParens "Always" does not insert a space 
before the opening parenthesis of an overloaded operator function. The attached 
patch against trunk resolves this.

Repository:
  rL LLVM

http://reviews.llvm.org/D11957

Files:
  lib/Format/TokenAnnotator.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -8276,6 +8276,8 @@
   verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace);
   verifyFormat("int f() throw(Deprecated);", NoSpace);
   verifyFormat("typedef void (*cb)(int);", NoSpace);
+  verifyFormat("T A::operator()();", NoSpace);
+  verifyFormat("X A::operator++(T);", NoSpace);
 
   FormatStyle Space = getLLVMStyle();
   Space.SpaceBeforeParens = FormatStyle::SBPO_Always;
@@ -8321,6 +8323,8 @@
   verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space);
   verifyFormat("int f () throw (Deprecated);", Space);
   verifyFormat("typedef void (*cb) (int);", Space);
+  verifyFormat("T A::operator() ();", Space);
+  verifyFormat("X A::operator++ (T);", Space);
 }
 
 TEST_F(FormatTest, ConfigurableSpacesInParentheses) {
Index: lib/Format/TokenAnnotator.cpp
===
--- lib/Format/TokenAnnotator.cpp
+++ lib/Format/TokenAnnotator.cpp
@@ -1997,7 +1997,7 @@
   if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen))
 return true;
   if (Right.is(TT_OverloadedOperatorLParen))
-return false;
+return Style.SpaceBeforeParens == FormatStyle::SBPO_Always;
   if (Right.is(tok::colon)) {
 if (Line.First->isOneOf(tok::kw_case, tok::kw_default) ||
 !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi))


Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -8276,6 +8276,8 @@
   verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace);
   verifyFormat("int f() throw(Deprecated);", NoSpace);
   verifyFormat("typedef void (*cb)(int);", NoSpace);
+  verifyFormat("T A::operator()();", NoSpace);
+  verifyFormat("X A::operator++(T);", NoSpace);
 
   FormatStyle Space = getLLVMStyle();
   Space.SpaceBeforeParens = FormatStyle::SBPO_Always;
@@ -8321,6 +8323,8 @@
   verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space);
   verifyFormat("int f () throw (Deprecated);", Space);
   verifyFormat("typedef void (*cb) (int);", Space);
+  verifyFormat("T A::operator() ();", Space);
+  verifyFormat("X A::operator++ (T);", Space);
 }
 
 TEST_F(FormatTest, ConfigurableSpacesInParentheses) {
Index: lib/Format/TokenAnnotator.cpp
===
--- lib/Format/TokenAnnotator.cpp
+++ lib/Format/TokenAnnotator.cpp
@@ -1997,7 +1997,7 @@
   if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen))
 return true;
   if (Right.is(TT_OverloadedOperatorLParen))
-return false;
+return Style.SpaceBeforeParens == FormatStyle::SBPO_Always;
   if (Right.is(tok::colon)) {
 if (Line.First->isOneOf(tok::kw_case, tok::kw_default) ||
 !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi))
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D11957: SpaceBeforeParens (Always) with overloaded operators

2015-08-11 Thread Jon Chesterfield via cfe-commits
JonChesterfield added a comment.

Thanks. I have no commit access.


Repository:
  rL LLVM

http://reviews.llvm.org/D11957



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-14 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-14 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

> I don't really like the whole "sufficiently simple function" thing. It seems 
> fragile. You should be able to just take a arbitrary internal varargs 
> function, rewrite its signature to take a va_list argument, rewrite calls to 
> va_start to make a copy of that va_list, and rewrite the callers to construct 
> that va_list. If that function turns out to be inlinable, great; if not, you 
> haven't really lost anything.

Yes, you can and I do. That's patch 2 of the series, numbered 1. in the list

> (Rewriting the signature of a function is complicated in its own way because 
> you need to allocate a new Function, then transplant the original function's 
> body into it. But it's not uncharted territory: we should be able to refactor 
> code out of llvm/lib/Transforms/IPO/ArgumentPromotion.cpp .)
> 
> Do we have a testing plan for this? Messing up calling convention stuff tends 
> to lead to extremely subtle bugs.

And this is why it's a separate patch. 

The rewrite-call-instruction is the target dependent bit, in this patch for an 
initial target.

The rewrite-function is (almost) target agnostic and involves a surprisingly 
large amount of book keeping. We have duplicated code in ArgumentPromotion, 
dead argument removal, the function cloning in attributor and probably 
elsewhere.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-14 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

> High level question: Does this patch eliminate the variadic call edge, or, 
> does it perform inlining on very special variadic function definitions? I 
> thought the former but `isFunctionInlinable`, sufficiently confused me.

This patch will rewrite calls to a variadic function into calls to a function 
taking a va_list. Later patches expand that to cover the test of the cases. 
Note that calls to variadic function pointers cannot generally be rewritten 
without permitting ABI changes, which is what I plan to do for nvptx and amdgpu.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-15 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

> Not sure if this means isFunctionInlinable will go away in the followup 
> patch, or if you plan to rewrite functions in a way that satisfies 
> isFunctionInlinable. I think the end state should be that all functions go 
> down the same codepath, not conditionally do something different based on 
> whether they're "simple". I guess I don't have a strong preference for how 
> you get there, though.

The logic I've got at present (which include the ABI rewriting) is

```C++
bool usefulToSplit =
splitFunctions() && (!F->isDeclaration() || rewriteABI());

// F may already be a single basic block calling a known function
// that takes a va_list, in which case it doens't need to be split.
Function *Equivalent = isFunctionInlinable(M, F);

if (usefulToSplit && !Equivalent) {
  Equivalent = DeriveInlinableVariadicFunctionPair(M, *F);
  assert(Equivalent);
  assert(isFunctionInlinable(M, *F)); // branch doesn't do this presently 
but it could do
  changed = true;
  functionToInliningTarget[F] = Equivalent;
}
```

I'm not especially attached to the specific control flow.

The two transforms - inlining/call-rewrite and splitting an entry block off the 
variadic function function so that it can be inlined - are genuinely orthogonal 
and that is a really good property to preserve. It took some thought to get to 
that structure.

If we ignore that design and run functions through the block splitting 
unnecessarily, we win a combinatorial increase in required testing, a decrease 
in emitted code quality (spurious extra functions), an inability to pattern 
match on fprintf->vfprintf style code that happens to be in the application 
already. We would get to delete the isFunctionInlinable predicate.

The independent transform pipeline pattern is more important than the no 
special case branching heuristic. If it helps, view it as two complementary 
transforms where the one is skipped when it would be a no-op.

Related - if there's an objection to landing this as an inactive pass (only 
exercised by test code) we can put it into an optimisation pipeline 
immediately, it'll still remove some real world variadic calls even if the 
later patches don't make it.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)

2024-02-16 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Enable by default without cmake and fedora run their own patch is my preferred 
solution.

The Siemens dev working on gcc amdgpu offloading told me they set rpath on the 
executable at a conference but I haven't checked their implementation. His 
attitude was that programs should be able to run without setting environment 
variables.

https://github.com/llvm/llvm-project/pull/82004
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)

2024-02-16 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

> IMHO I prefer to ask/request users to do the right thing.

One of the drawbacks to asking users to do the "right thing" is that it goes 
something like:
- you must use global state to tell the compiler where the compiler libraries 
are
- you should do this using clang configuration files which are like commandline 
flags only different
- the flag is called rpath, which means runpath, but used to mean rpath, 
because of this historic context from glibc
- plus you need to work out what the (multiple) openmp libraries are called and 
where they are
- and some of them are bitcode, found using this different mechanism
- and if you don't get it totally right, things won't work

which means we've chosen "right" to mean "trivially convenient for compiler 
developers who don't like changing things", which is not likely to be what the 
user had in mind. They should look at this pile of spurious complexity and 
conclude they don't want to be an openmp user after all.

We really don't have a good answer to "why can't my compiler find it's own 
libraries?", since the best we've got is a reference to a build script Fedora 
deploy for reasons that they don't really go into in their docs, and where I 
still haven't managed to find the script itself to reverse engineer what 
exactly it's rejecting.

I'm still happy with a heuristic that amounts to "if the compiler libs are 
being installed under /usr or /lib, assume the system can find them, and 
otherwise set rpath on the executable", especially if it's tied to a command 
line flag or cmake control which lets the user override whichever default we 
picked for them.

https://github.com/llvm/llvm-project/pull/82004
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Jon Chesterfield via cfe-commits


@@ -50,31 +50,9 @@ function(collect_object_file_deps target result)
   endif()
 endfunction(collect_object_file_deps)
 
-# A rule to build a library from a collection of entrypoint objects.
-# Usage:
-# add_entrypoint_library(
-#   DEPENDS 
-# )
-#
-# NOTE: If one wants an entrypoint to be available in a library, then they will
-# have to list the entrypoint target explicitly in the DEPENDS list. Implicit
-# entrypoint dependencies will not be added to the library.
-function(add_entrypoint_library target_name)
-  cmake_parse_arguments(
-"ENTRYPOINT_LIBRARY"
-"" # No optional arguments
-"" # No single value arguments
-"DEPENDS" # Multi-value arguments
-${ARGN}
-  )
-  if(NOT ENTRYPOINT_LIBRARY_DEPENDS)
-message(FATAL_ERROR "'add_entrypoint_library' target requires a DEPENDS 
list "
-"of 'add_entrypoint_object' targets.")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ENTRYPOINT_LIBRARY_DEPENDS})
+function(get_all_object_file_deps result fq_deps_list)

JonChesterfield wrote:

This looks like factoring some existing code into a function - if you landed 
that refactor without changing what the code does, I think it would make this 
diff much more legible as the current and new code could align

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-19 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

One large patch may be necessary - is it also necessary to interleave 
reordering files with changing the contents? It makes the GUI diff tool we're 
using here essentially useless. If the moving code between files and factoring 
into functions was a separate commit we'd have a much better chance of seeing 
what you've changed vs what you've moved.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-20 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

OK, worked through this patch now. The noise is substantial but it's an 
improvement on what we have - the overall impression is that the cmake was 
originally very special cased for GPUs and now treats them very similarly to 
other targets, with some careful footwork around compiling the loaders for the 
host.

This is also a step towards making the GPU parts of the cmake easier to edit 
for non-GPU people which is a strong win.

Thank you Michael and JP. Solid effort working through this Joseph. Let's ship 
it.

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-20 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.


https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-20 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield commented:

Stalled on https://github.com/llvm/llvm-project/pull/81557, trying to remove 
the approve mark as otherwise i'll forget about this

https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-20 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield dismissed 
https://github.com/llvm/llvm-project/pull/81921
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-20 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return 

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-20 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Ah OK, so split every variadic definition and let the inliner sort it out 
afterwards. Yes, I'm good with that. Tests either get messier or add a call to 
the inliner. Will update the PR correspondingly, solid simplification, thanks!

Discard the combinatorial testing comment - I misunderstood the structure you 
had in mind.

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [openmp] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-21 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield approved this pull request.

I like this a lot, thank you.

https://github.com/llvm/llvm-project/pull/80460
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

This PR is invalid.

First, the alignment on the eight byte pointer is supposed to be four. 
Increasing it to 8 makes things worse.

Second, I can't see any support for the claim that the code is incrementing by 
the alignment of the value, as opposed to the size.

The frame is setup as a struct instance with explicit padding by Int8Tys and 
the calculation there is correct.

The va_arg increment is done in CodeGen:emitVoidPtrVAArg, where DirectSize is 
ValueInfo.Width, aligned to the 4 byte slot size, then stored. It does not 
increment the iterator by the alignment of the type.

The lowering pass is doing exactly what was intended.


https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -0,0 +1,77 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck 
%s
+
+extern void varargs_simple(int, ...);
+
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1
+// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2
+// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4
+// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8
+// CHECK-NEXT:[[F:%.*]] = alloca float, align 4
+// CHECK-NEXT:[[D:%.*]] = alloca double, align 8
+// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4
+// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16
+// CHECK-NEXT:store i8 1, ptr [[C]], align 1
+// CHECK-NEXT:store i16 1, ptr [[S]], align 2
+// CHECK-NEXT:store i32 1, ptr [[I]], align 4
+// CHECK-NEXT:store i64 1, ptr [[L]], align 8
+// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4
+// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1
+// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32

JonChesterfield wrote:

C promotes them to i32. C has a lot of rules around vararg type promotion that 
have not aged brilliantly.

If you want a i8 or i16, put it in a struct. C doesn't say anything about 
promoting that and amdgpu will pass it inlined into the struct, i.e. with no 
overhead. I believe nvptx will do likewise.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext &Ctx) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module &M) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override {
+// NVPTX expects natural alignment in all cases. The variadic call ABI will
+// handle promoting types to their appropriate size and alignment.
+const unsigned MinAlign = 1;
+Align A = DL.getABITypeAlign(Parameter);

JonChesterfield wrote:

can getABITypeAlign return 0?

Does nvptx actually expect natural alignment? That would be inconsistent with 
the slot size of four which strongly suggests everything is passed with at 
least four byte alignment

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield requested changes to this pull request.

The amdgpu patch is incorrect, see 
https://github.com/llvm/llvm-project/pull/96370/

The nvptx lowering looks dubious - values smaller than slot size should be 
passed with the same alignment as the slot and presently aren't. A struct 
containing i8, i16 or half should be miscompiled on nvptx as written.

No comment on the libc part.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield requested changes to this pull request.

Patch should not land. Need to know what bug this was trying to address to 
guess at what the right fix would be.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Ah yes, libc code doing the equivalent of va_arg assuming natural alignment 
when the underlying buffer is a packed struct with fields padded to four bytes 
would not work. That would be "fixed" by changing the compiler to match the 
assumption made by libc, but it seems much better for libc to do the misaligned 
load instead. Then there's no wavesize*align-padding stack space burned at 
runtime.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

I've passed some types to nvcc on godbolt and tried to decode the results. It 
looks like it's passing everything with natural alignment, flattened, with 
total disregard to the minimum slot size premise clang uses.

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo &FI) const {
 
 RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
QualType Ty, AggValueSlot Slot) const {
-  llvm_unreachable("NVPTX does not support varargs");
+  return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false,
+  getContext().getTypeInfoInChars(Ty),
+  CharUnits::fromQuantity(4),

JonChesterfield wrote:

Error is here - this says slots shall be at least four bytes in size, but nvcc 
looks happy to pass struct {char} right next to other things, so we're looking 
for CharUnits::fromQuantity(1),

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext &Ctx) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module &M) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override {
+// NVPTX expects natural alignment in all cases. The variadic call ABI will
+// handle promoting types to their appropriate size and alignment.
+const unsigned MinAlign = 1;
+Align A = DL.getABITypeAlign(Parameter);

JonChesterfield wrote:

I suspect Matt means `Align MinAlign = 1;` instead of `unsigned MinAlign = `

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield edited 
https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo {
   }
 };
 
+struct NVPTX final : public VariadicABIInfo {
+
+  bool enableForTarget() override { return true; }
+
+  bool vaListPassedInSSARegister() override { return true; }
+
+  Type *vaListType(LLVMContext &Ctx) override {
+return PointerType::getUnqual(Ctx);
+  }
+
+  Type *vaListParameterType(Module &M) override {
+return PointerType::getUnqual(M.getContext());
+  }
+
+  Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder,
+  AllocaInst *, Value *Buffer) override {
+return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M));
+  }
+
+  VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override {
+// NVPTX expects natural alignment in all cases. The variadic call ABI will
+// handle promoting types to their appropriate size and alignment.
+const unsigned MinAlign = 1;
+Align A = DL.getABITypeAlign(Parameter);
+if (A < MinAlign)
+  A = Align(MinAlign);
+return {A, false};
+  }

JonChesterfield wrote:

Are you really looking for `if (A < 1) A = 1;` here? Align has max/min 
functions which would be clearer

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -54,7 +54,34 @@ class MockArgList {
   }
 
   template  LIBC_INLINE T next_var() {
-++arg_counter;
+arg_counter++;
+return T(arg_counter);
+  }
+
+  size_t read_count() const { return arg_counter; }
+};
+
+// Used by the GPU implementation to parse how many bytes need to be read from
+// the variadic argument buffer.
+class DummyArgList {
+  size_t arg_counter = 0;
+
+public:
+  LIBC_INLINE DummyArgList() = default;
+  LIBC_INLINE DummyArgList(va_list) { ; }
+  LIBC_INLINE DummyArgList(DummyArgList &other) {
+arg_counter = other.arg_counter;
+  }
+  LIBC_INLINE ~DummyArgList() = default;
+
+  LIBC_INLINE DummyArgList &operator=(DummyArgList &rhs) {
+arg_counter = rhs.arg_counter;
+return *this;
+  }
+
+  template  LIBC_INLINE T next_var() {
+arg_counter =
+((arg_counter + alignof(T) - 1) / alignof(T)) * alignof(T) + sizeof(T);
 return T(arg_counter);

JonChesterfield wrote:

maybe split this into separate increment by size and round for the alignment 
operations. There might be helper functions for the rounding which would be 
clearer than reading the division / multiplication and trying to pattern match 
on what alignment code usually looks like (e.g. I expected to see masking off 
bits so tripped over this)

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Jon Chesterfield via cfe-commits


@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public 
TargetInfo {
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
-// FIXME: implement
-return TargetInfo::CharPtrBuiltinVaList;
+return TargetInfo::VoidPtrBuiltinVaList;

JonChesterfield wrote:

These should be the same as far as codegen is concerned I think, in which case 
we should probably leave it unchanged. Or is there a reason to change it I'm 
missing?

https://github.com/llvm/llvm-project/pull/96369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Sanitizer] Make sanitizer passes idempotent (PR #99439)

2024-08-13 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Sanizer passes setting a "no sanitizer" magic variable is backwards.

If this behaviour is the way to go, have clang set a "needs_asan_lowering" or 
whatever and have the corresponding pass remove it.

It shouldn't be necessary to emit ever increasing lists of pass and target 
specific cruft in the IR to avoid miscompilation. The opposite way round is 
much better - compile correctly when the flag is missing, and add the ad hoc 
metadata to switch on non-default behaviour

https://github.com/llvm/llvm-project/pull/99439
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   >