[clang] 679158e - Make hip math headers easier to use from C
Author: Jon Chesterfield Date: 2020-07-24T20:50:46+01:00 New Revision: 679158e662aa247282b8eea4c2d60b33204171fb URL: https://github.com/llvm/llvm-project/commit/679158e662aa247282b8eea4c2d60b33204171fb DIFF: https://github.com/llvm/llvm-project/commit/679158e662aa247282b8eea4c2d60b33204171fb.diff LOG: Make hip math headers easier to use from C Summary: Make hip math headers easier to use from C Motivation is a step towards using the hip math headers to implement math.h for openmp, which needs to work with C as well as C++. NFC for C++ code. Reviewers: yaxunl, jdoerfert Reviewed By: yaxunl Subscribers: sstefan1, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D84476 Added: Modified: clang/lib/Headers/__clang_hip_libdevice_declares.h clang/lib/Headers/__clang_hip_math.h Removed: diff --git a/clang/lib/Headers/__clang_hip_libdevice_declares.h b/clang/lib/Headers/__clang_hip_libdevice_declares.h index 711040443440..2cf9cc7f1eb6 100644 --- a/clang/lib/Headers/__clang_hip_libdevice_declares.h +++ b/clang/lib/Headers/__clang_hip_libdevice_declares.h @@ -10,7 +10,9 @@ #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__ #define __CLANG_HIP_LIBDEVICE_DECLARES_H__ +#ifdef __cplusplus extern "C" { +#endif // BEGIN FLOAT __device__ __attribute__((const)) float __ocml_acos_f32(float); @@ -316,7 +318,7 @@ __device__ __attribute__((pure)) __2f16 __ocml_log2_2f16(__2f16); __device__ inline __2f16 __llvm_amdgcn_rcp_2f16(__2f16 __x) // Not currently exposed by ROCDL. { - return __2f16{__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)}; + return (__2f16){__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)}; } __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16); @@ -325,6 +327,8 @@ __device__ __attribute__((const)) __2f16 __ocml_sqrt_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_pown_2f16(__2f16, __2i16); +#ifdef __cplusplus } // extern "C" +#endif #endif // __CLANG_HIP_LIBDEVICE_DECLARES_H__ diff --git a/clang/lib/Headers/__clang_hip_math.h b/clang/lib/Headers/__clang_hip_math.h index 47d3c1717559..f9ca9bf606fb 100644 --- a/clang/lib/Headers/__clang_hip_math.h +++ b/clang/lib/Headers/__clang_hip_math.h @@ -95,8 +95,10 @@ inline uint64_t __make_mantissa(const char *__tagp) { } // BEGIN FLOAT +#ifdef __cplusplus __DEVICE__ inline float abs(float __x) { return __ocml_fabs_f32(__x); } +#endif __DEVICE__ inline float acosf(float __x) { return __ocml_acos_f32(__x); } __DEVICE__ @@ -251,7 +253,7 @@ inline float nanf(const char *__tagp) { uint32_t sign : 1; } bits; -static_assert(sizeof(float) == sizeof(ieee_float), ""); +static_assert(sizeof(float) == sizeof(struct ieee_float), ""); } __tmp; __tmp.bits.sign = 0u; @@ -553,8 +555,10 @@ inline float __tanf(float __x) { return __ocml_tan_f32(__x); } // END FLOAT // BEGIN DOUBLE +#ifdef __cplusplus __DEVICE__ inline double abs(double __x) { return __ocml_fabs_f64(__x); } +#endif __DEVICE__ inline double acos(double __x) { return __ocml_acos_f64(__x); } __DEVICE__ @@ -712,7 +716,7 @@ inline double nan(const char *__tagp) { uint32_t exponent : 11; uint32_t sign : 1; } bits; -static_assert(sizeof(double) == sizeof(ieee_double), ""); +static_assert(sizeof(double) == sizeof(struct ieee_double), ""); } __tmp; __tmp.bits.sign = 0u; @@ -1178,6 +1182,7 @@ __host__ inline static int max(int __arg1, int __arg2) { return std::max(__arg1, __arg2); } +#ifdef __cplusplus __DEVICE__ inline float pow(float __base, int __iexp) { return powif(__base, __iexp); } @@ -1188,6 +1193,7 @@ __DEVICE__ inline _Float16 pow(_Float16 __base, int __iexp) { return __ocml_pown_f16(__base, __iexp); } +#endif #pragma pop_macro("__DEF_FUN1") #pragma pop_macro("__DEF_FUN2") ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
https://github.com/JonChesterfield edited https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
JonChesterfield wrote: Looks solid to me. The patch to clang is long but straightforward and the tests look reassuringly exhaustive. Probably good that you ignored my name suggestion of integers 0 through N. This patch is partly motivated by us wanting device scope atomics in libc. It removes one of the remaining stumbling blocks for people who like freestanding C++ as a GPU programming language. Hopefully the clang people consider the extension acceptable. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
https://github.com/JonChesterfield approved this pull request. This is functionally correct and useful as is - if gcc decide to do something divergent we can change it later, it's basically an internal interface anyway. Let's have it now and change the names if we come up with better ideas later. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)
https://github.com/JonChesterfield approved this pull request. Test change is suspect for a patch claiming NFC but it looks like the change is harmless. Thanks for separating refactor from functional change https://github.com/llvm/llvm-project/pull/72544 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [lld] [mlir] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)
@@ -75,8 +75,8 @@ bb.2: store volatile i32 0, ptr addrspace(1) undef ret void } -; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4112 -; DEFAULTSIZE: ; ScratchSize: 4112 +; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 16 JonChesterfield wrote: This seems a bit suspect. It used to be about 4k and is now 16. Are we out by a factor of 1024 somewhere? https://github.com/llvm/llvm-project/pull/73000 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [lld] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #73000)
JonChesterfield wrote: This is a wild amount of code churn from a trivial change. 10k lines of almost all noise. Means the chances of us noticing breakage in a code review tool is pretty low. How about as a first patch we pass `-code-object=v4` or whatever syntax to essentially all the tests, then rebase this, so that we can get something approximating "this is the functional change, with the codegen change visible in these tests"? In general it seems likely that a lot of tests are checking things they don't actually care about, probably because they're frequently generated by the python thing. Maybe some of the noise can be removed by tweaking the test generator script to emit checks that are insensitive to ABI version? https://github.com/llvm/llvm-project/pull/73000 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
JonChesterfield wrote: The capability is more important than the naming. `__llvm_atomic_scoped_load` would be fine, with string literals or enum or macro controlling the scope. I also don't mind if it's a scoped argument or if we end up with `__llvm_atomic_seqcst_device_load`, embedding all of it in the symbol name. Clang can't currently instantiate IR atomics with scopes and it would be useful to do so. If GCC picks a different set of names - maybe they go with defines for scope and we go with strings, and the names differ - we get to pick between renaming ours, adding aliases, ignoring the divergence. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 3e649f8 - [openmp][nfc] Simplify macros guarding math complex headers
Author: Jon Chesterfield Date: 2021-07-18T23:30:35+01:00 New Revision: 3e649f8ef1875f943537b5fcecdb132c9442cb7d URL: https://github.com/llvm/llvm-project/commit/3e649f8ef1875f943537b5fcecdb132c9442cb7d DIFF: https://github.com/llvm/llvm-project/commit/3e649f8ef1875f943537b5fcecdb132c9442cb7d.diff LOG: [openmp][nfc] Simplify macros guarding math complex headers The `__CUDA__` macro is already defined for openmp/nvptx and is not used by `__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies nvptx and avoids defining it on amdgcn (where it is likely to be harmful). Also dropped a cplusplus test from a C++ header as compilation will have failed on cmath earlier if it was included from C. Reviewed By: jdoerfert, fodinabor Differential Revision: https://reviews.llvm.org/D105221 Added: Modified: clang/lib/Headers/openmp_wrappers/complex clang/lib/Headers/openmp_wrappers/complex.h Removed: diff --git a/clang/lib/Headers/openmp_wrappers/complex b/clang/lib/Headers/openmp_wrappers/complex index 142e526b81b35..dfd6193c97cbd 100644 --- a/clang/lib/Headers/openmp_wrappers/complex +++ b/clang/lib/Headers/openmp_wrappers/complex @@ -17,7 +17,6 @@ // We require std::math functions in the complex builtins below. #include -#define __CUDA__ #define __OPENMP_NVPTX__ #include <__clang_cuda_complex_builtins.h> #undef __OPENMP_NVPTX__ @@ -26,9 +25,6 @@ // Grab the host header too. #include_next - -#ifdef __cplusplus - // If we are compiling against libc++, the macro _LIBCPP_STD_VER should be set // after including above. Since the complex header we use is a // simplified version of the libc++, we don't need it in this case. If we @@ -48,5 +44,3 @@ #pragma omp end declare variant #endif - -#endif diff --git a/clang/lib/Headers/openmp_wrappers/complex.h b/clang/lib/Headers/openmp_wrappers/complex.h index 00d278548f826..15dc415b8126d 100644 --- a/clang/lib/Headers/openmp_wrappers/complex.h +++ b/clang/lib/Headers/openmp_wrappers/complex.h @@ -17,7 +17,6 @@ // We require math functions in the complex builtins below. #include -#define __CUDA__ #define __OPENMP_NVPTX__ #include <__clang_cuda_complex_builtins.h> #undef __OPENMP_NVPTX__ ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 968899a - [OpenMP][AMDGCN] Initial math headers support
Author: Pushpinder Singh Date: 2021-07-21T16:15:39+01:00 New Revision: 968899ad9cf17579f9867dafb35c4d97bad0863f URL: https://github.com/llvm/llvm-project/commit/968899ad9cf17579f9867dafb35c4d97bad0863f DIFF: https://github.com/llvm/llvm-project/commit/968899ad9cf17579f9867dafb35c4d97bad0863f.diff LOG: [OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904 Added: clang/test/Headers/Inputs/include/algorithm clang/test/Headers/Inputs/include/utility clang/test/Headers/amdgcn_openmp_device_math.c Modified: clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Headers/__clang_hip_cmath.h clang/lib/Headers/__clang_hip_math.h clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h clang/lib/Headers/openmp_wrappers/cmath clang/lib/Headers/openmp_wrappers/math.h clang/test/Headers/Inputs/include/cstdlib clang/test/Headers/openmp_device_math_isnan.cpp Removed: diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 837ead86d6202..cf8e209ba1af1 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1255,7 +1255,8 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, // If we are offloading to a target via OpenMP we need to include the // openmp_wrappers folder which contains alternative system headers. if (JA.isDeviceOffloading(Action::OFK_OpenMP) && - getToolChain().getTriple().isNVPTX()){ + (getToolChain().getTriple().isNVPTX() || + getToolChain().getTriple().isAMDGCN())) { if (!Args.hasArg(options::OPT_nobuiltininc)) { // Add openmp_wrappers/* to our system include path. This lets us wrap // standard library headers. diff --git a/clang/lib/Headers/__clang_hip_cmath.h b/clang/lib/Headers/__clang_hip_cmath.h index 7342705434e6b..6f7cbde38dd20 100644 --- a/clang/lib/Headers/__clang_hip_cmath.h +++ b/clang/lib/Headers/__clang_hip_cmath.h @@ -10,7 +10,7 @@ #ifndef __CLANG_HIP_CMATH_H__ #define __CLANG_HIP_CMATH_H__ -#if !defined(__HIP__) +#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__) #error "This file is for HIP and OpenMP AMDGCN device compilation only." #endif @@ -25,31 +25,38 @@ #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") +#pragma push_macro("__CONSTEXPR__") +#ifdef __OPENMP_AMDGCN__ +#define __DEVICE__ static __attribute__((always_inline, nothrow)) +#define __CONSTEXPR__ constexpr +#else #define __DEVICE__ static __device__ inline __attribute__((always_inline)) +#define __CONSTEXPR__ +#endif // __OPENMP_AMDGCN__ // Start with functions that cannot be defined by DEF macros below. #if defined(__cplusplus) -__DEVICE__ double abs(double __x) { return ::fabs(__x); } -__DEVICE__ float abs(float __x) { return ::fabsf(__x); } -__DEVICE__ long long abs(long long __n) { return ::llabs(__n); } -__DEVICE__ long abs(long __n) { return ::labs(__n); } -__DEVICE__ float fma(float __x, float __y, float __z) { +__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); } +__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); } +__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); } +__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); } +__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) { return ::fmaf(__x, __y, __z); } #if !defined(__HIPCC_RTC__) // The value returned by fpclassify is platform dependent, therefore it is not // supported by hipRTC. -__DEVICE__ int fpclassify(float __x) { +__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } -__DEVICE__ int fpclassify(double __x) { +__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } #endif // !defined(__HIPCC_RTC__) -__DEVICE__ float frexp(float __arg, int *__exp) { +__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) { return ::frexpf(__arg, __exp); } @@ -71,90 +78,97 @@ __DEVICE__ float frexp(float __arg, int *__exp) { //of the variants inside the inner region and avoid the clash. #pragma omp begin declare variant match(implementation = {vendor(llvm)}) -__DEVICE__ int isinf(float __x) { return ::__isinff(__x); } -__DEVICE__ int isinf(double __x) { return ::__isinf(__x); } -__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); } -__DEVICE__ int isfinite(double __x) { return ::__finite(__x); } -__DEVICE__ int is
[clang] d71062f - Revert "[OpenMP][AMDGCN] Initial math headers support"
Author: Jon Chesterfield Date: 2021-07-21T17:35:40+01:00 New Revision: d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9 URL: https://github.com/llvm/llvm-project/commit/d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9 DIFF: https://github.com/llvm/llvm-project/commit/d71062fbdab26fcc1c7e25ccdae410e1c61ed7f9.diff LOG: Revert "[OpenMP][AMDGCN] Initial math headers support" This reverts commit 968899ad9cf17579f9867dafb35c4d97bad0863f. Added: Modified: clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Headers/__clang_hip_cmath.h clang/lib/Headers/__clang_hip_math.h clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h clang/lib/Headers/openmp_wrappers/cmath clang/lib/Headers/openmp_wrappers/math.h clang/test/Headers/Inputs/include/cstdlib clang/test/Headers/openmp_device_math_isnan.cpp Removed: clang/test/Headers/Inputs/include/algorithm clang/test/Headers/Inputs/include/utility clang/test/Headers/amdgcn_openmp_device_math.c diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index cf8e209ba1af1..837ead86d6202 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1255,8 +1255,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, // If we are offloading to a target via OpenMP we need to include the // openmp_wrappers folder which contains alternative system headers. if (JA.isDeviceOffloading(Action::OFK_OpenMP) && - (getToolChain().getTriple().isNVPTX() || - getToolChain().getTriple().isAMDGCN())) { + getToolChain().getTriple().isNVPTX()){ if (!Args.hasArg(options::OPT_nobuiltininc)) { // Add openmp_wrappers/* to our system include path. This lets us wrap // standard library headers. diff --git a/clang/lib/Headers/__clang_hip_cmath.h b/clang/lib/Headers/__clang_hip_cmath.h index 6f7cbde38dd20..7342705434e6b 100644 --- a/clang/lib/Headers/__clang_hip_cmath.h +++ b/clang/lib/Headers/__clang_hip_cmath.h @@ -10,7 +10,7 @@ #ifndef __CLANG_HIP_CMATH_H__ #define __CLANG_HIP_CMATH_H__ -#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__) +#if !defined(__HIP__) #error "This file is for HIP and OpenMP AMDGCN device compilation only." #endif @@ -25,38 +25,31 @@ #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") -#pragma push_macro("__CONSTEXPR__") -#ifdef __OPENMP_AMDGCN__ -#define __DEVICE__ static __attribute__((always_inline, nothrow)) -#define __CONSTEXPR__ constexpr -#else #define __DEVICE__ static __device__ inline __attribute__((always_inline)) -#define __CONSTEXPR__ -#endif // __OPENMP_AMDGCN__ // Start with functions that cannot be defined by DEF macros below. #if defined(__cplusplus) -__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); } -__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); } -__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); } -__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); } -__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) { +__DEVICE__ double abs(double __x) { return ::fabs(__x); } +__DEVICE__ float abs(float __x) { return ::fabsf(__x); } +__DEVICE__ long long abs(long long __n) { return ::llabs(__n); } +__DEVICE__ long abs(long __n) { return ::labs(__n); } +__DEVICE__ float fma(float __x, float __y, float __z) { return ::fmaf(__x, __y, __z); } #if !defined(__HIPCC_RTC__) // The value returned by fpclassify is platform dependent, therefore it is not // supported by hipRTC. -__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) { +__DEVICE__ int fpclassify(float __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } -__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) { +__DEVICE__ int fpclassify(double __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } #endif // !defined(__HIPCC_RTC__) -__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) { +__DEVICE__ float frexp(float __arg, int *__exp) { return ::frexpf(__arg, __exp); } @@ -78,97 +71,90 @@ __DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) { //of the variants inside the inner region and avoid the clash. #pragma omp begin declare variant match(implementation = {vendor(llvm)}) -__DEVICE__ __CONSTEXPR__ int isinf(float __x) { return ::__isinff(__x); } -__DEVICE__ __CONSTEXPR__ int isinf(double __x) { return ::__isinf(__x); } -__DEVICE__ __CONSTEXPR__ int isfinite(float __x) { return ::__finitef(__x); } -__DEVICE__ __CONSTEXPR__ int isfinite(double __x) { return ::__finite(__x); } -__DEVICE__ __CONSTEXPR__ int isnan(float __x) { return ::__isnanf(__x); } -__DEVICE__ __CONSTEXPR__ int isnan(double __x) { return ::__isn
[clang] 83c431f - [amdgpu] Add amdgpu_kernel calling conv attribute to clang
Author: Jon Chesterfield Date: 2022-05-20T08:50:37+01:00 New Revision: 83c431fb9e72abbd2eddf26388245eb4963370e2 URL: https://github.com/llvm/llvm-project/commit/83c431fb9e72abbd2eddf26388245eb4963370e2 DIFF: https://github.com/llvm/llvm-project/commit/83c431fb9e72abbd2eddf26388245eb4963370e2.diff LOG: [amdgpu] Add amdgpu_kernel calling conv attribute to clang Allows emitting define amdgpu_kernel void @func() IR from C or C++. This replaces the current workflow which is to write a stub in opencl that calls an external C function implemented in C++ combined through llvm-link. Calling the resulting function still requires a manual implementation of the ABI from the host side. The primary application is for more rapid debugging of the amdgpu backend by permuting a C or C++ test file instead of manually updating an IR file. Implementation closely follows D54425. Non-amd reviewers from there. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D125970 Added: clang/test/CodeGenCXX/amdgpu-kernel-arg-pointer-type.cpp Modified: clang/include/clang/Basic/Attr.td clang/include/clang/Basic/Specifiers.h clang/lib/AST/ItaniumMangle.cpp clang/lib/AST/Type.cpp clang/lib/AST/TypePrinter.cpp clang/lib/Basic/Targets/AMDGPU.h clang/lib/CodeGen/CGCall.cpp clang/lib/CodeGen/CGDebugInfo.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/lib/Sema/SemaType.cpp clang/test/Sema/callingconv.c clang/tools/libclang/CXType.cpp Removed: diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 0c95adfa237d7..fed29b03a8b14 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -1857,6 +1857,11 @@ def AMDGPUNumVGPR : InheritableAttr { let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">; } +def AMDGPUKernelCall : DeclOrTypeAttr { + let Spellings = [Clang<"amdgpu_kernel">]; + let Documentation = [Undocumented]; +} + def BPFPreserveAccessIndex : InheritableAttr, TargetSpecificAttr { let Spellings = [Clang<"preserve_access_index">]; diff --git a/clang/include/clang/Basic/Specifiers.h b/clang/include/clang/Basic/Specifiers.h index 7a727e7088deb..7657ae36d21bb 100644 --- a/clang/include/clang/Basic/Specifiers.h +++ b/clang/include/clang/Basic/Specifiers.h @@ -281,6 +281,7 @@ namespace clang { CC_PreserveAll, // __attribute__((preserve_all)) CC_AArch64VectorCall, // __attribute__((aarch64_vector_pcs)) CC_AArch64SVEPCS, // __attribute__((aarch64_sve_pcs)) +CC_AMDGPUKernelCall, // __attribute__((amdgpu_kernel)) }; /// Checks whether the given calling convention supports variadic diff --git a/clang/lib/AST/ItaniumMangle.cpp b/clang/lib/AST/ItaniumMangle.cpp index 1be70487c1b4e..b380e02fc8f7d 100644 --- a/clang/lib/AST/ItaniumMangle.cpp +++ b/clang/lib/AST/ItaniumMangle.cpp @@ -3150,6 +3150,7 @@ StringRef CXXNameMangler::getCallingConvQualifierName(CallingConv CC) { case CC_AAPCS_VFP: case CC_AArch64VectorCall: case CC_AArch64SVEPCS: + case CC_AMDGPUKernelCall: case CC_IntelOclBicc: case CC_SpirFunction: case CC_OpenCLKernel: diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp index 200a129437ed5..ece4165c51f53 100644 --- a/clang/lib/AST/Type.cpp +++ b/clang/lib/AST/Type.cpp @@ -3186,6 +3186,7 @@ StringRef FunctionType::getNameForCallConv(CallingConv CC) { case CC_AAPCS_VFP: return "aapcs-vfp"; case CC_AArch64VectorCall: return "aarch64_vector_pcs"; case CC_AArch64SVEPCS: return "aarch64_sve_pcs"; + case CC_AMDGPUKernelCall: return "amdgpu_kernel"; case CC_IntelOclBicc: return "intel_ocl_bicc"; case CC_SpirFunction: return "spir_function"; case CC_OpenCLKernel: return "opencl_kernel"; @@ -3622,6 +3623,7 @@ bool AttributedType::isCallingConv() const { case attr::VectorCall: case attr::AArch64VectorPcs: case attr::AArch64SVEPcs: + case attr::AMDGPUKernelCall: case attr::Pascal: case attr::MSABI: case attr::SysVABI: diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp index 7bca45b5f5601..b5286de5b1cab 100644 --- a/clang/lib/AST/TypePrinter.cpp +++ b/clang/lib/AST/TypePrinter.cpp @@ -964,6 +964,9 @@ void TypePrinter::printFunctionAfter(const FunctionType::ExtInfo &Info, case CC_AArch64SVEPCS: OS << "__attribute__((aarch64_sve_pcs))"; break; +case CC_AMDGPUKernelCall: + OS << "__attribute__((amdgpu_kernel))"; + break; case CC_IntelOclBicc: OS << " __attribute__((intel_ocl_bicc))"; break; @@ -1754,6 +1757,7 @@ void TypePrinter::printAttributedAfter(const AttributedType *T, } case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break; case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break; + case attr::AMDGPUKernelCall: OS << "amdgpu_kernel"; break; case attr::IntelOclBicc:
[libunwind] [libunwind] Compile the asm as well as the C++ source (PR #86351)
https://github.com/JonChesterfield created https://github.com/llvm/llvm-project/pull/86351 When a CMakeLists.txt is missing a 'project' statement you get the default supported languages of C and CXX. https://cmake.org/cmake/help/latest/command/project.html. The help says ASM should be listed last. CMake doesn't raise an error about the .S files it has been told about when project is missing. It silently ignores them. In this case, the symptom is an undefined symbol *jumpto in the library. Working theory for why this isn't more obviously broken everywhere is the 'runtimes' CMakeLists.txt does contain a 'project' statement which lists ASM and/or by default linking shared libraries with undefined symbols succeeds. The string immediately after project appears to be arbitrary, chosen 'Unwind' to match the capitalization of 'Runtimes'. For completeness, this also removes the following warning when building libunwind by itself: >CMake Warning (dev) in CMakeLists.txt: > No project() command is present. The top-level CMakeLists.txt file must > contain a literal, direct call to the project() command. Add a line of > code such as > >project(ProjectName) > > near the top of the file, but after cmake_minimum_required(). > > CMake is pretending there is a "project(Project)" command on the first > line. > This warning is for project developers. Use -Wno-dev to suppress it. This gives no hint that the consequence of ignoring this warning is cmake will ignore your assembly. >From 3cacdbc0585d00c8820dc7baa0cba378beeff339 Mon Sep 17 00:00:00 2001 From: Jon Chesterfield Date: Fri, 22 Mar 2024 22:02:06 + Subject: [PATCH] [libunwind] Compile the asm as well as the C source --- libunwind/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/libunwind/CMakeLists.txt b/libunwind/CMakeLists.txt index 806d5a783ec39c..01d3b72b73e842 100644 --- a/libunwind/CMakeLists.txt +++ b/libunwind/CMakeLists.txt @@ -3,6 +3,7 @@ #=== cmake_minimum_required(VERSION 3.20.0) +project(Unwind LANGUAGES C CXX ASM) set(LLVM_COMMON_CMAKE_UTILS "${CMAKE_CURRENT_SOURCE_DIR}/../cmake") ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libunwind] [libunwind] Compile the asm as well as the C++ source (PR #86351)
JonChesterfield wrote: I'm sorry to hear that. I've only used the ENABLE_RUNTIMES in the context of compiling clang first, and then compiling the libraries under runtime with that clang. The recursive invocation drops (most) arguments passed to cmake which has been obstructive in the past. With standalone build (presumably ENABLE_PROJECTS) removed, how does one build the libraries using an existing compiler? https://github.com/llvm/llvm-project/pull/86351 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
https://github.com/JonChesterfield edited https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,716 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface() {} + +public: + virtual ~Interface() {} + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst *va_list, Value * /*buffer*/) { JonChesterfield wrote: Expressing the class invariants through the vtable seems harder than it should be. This function is only called if valistOnStack returns true, changing the base to a builtin_unreachable. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,716 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface() {} + +public: + virtual ~Interface() {} + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst *va_list, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +assert(va_list == nullptr); + } + + virtual uint32_t minimum_slot_align() = 0; + virtual uint32_t maximum_slot_align() = 0; + + // Could make these virtual, fair chance that's free since all + // classes choose not to override them at present + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,716 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface() {} + +public: + virtual ~Interface() {} + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst *va_list, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +assert(va_list == nullptr); + } + + virtual uint32_t minimum_slot_align() = 0; + virtual uint32_t maximum_slot_align() = 0; + + // Could make these virtual, fair chance that's free since all + // classes choose not to override them at present + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,716 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface() {} + +public: + virtual ~Interface() {} + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst *va_list, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +assert(va_list == nullptr); + } + + virtual uint32_t minimum_slot_align() = 0; + virtual uint32_t maximum_slot_align() = 0; + + // Could make these virtual, fair chance that's free since all + // classes choose not to override them at present + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
https://github.com/JonChesterfield edited https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); JonChesterfield wrote: I've gone with more comments, but maybe to make this clear enough I need to separate the field alignment from whether va_list is a void* or something that needs more
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); JonChesterfield wrot
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,117 @@ +// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default" -S | FileCheck %s --check-prefixes=CHECK,X86Linux JonChesterfield wrote: Nope. There's something weird/broken with opt here. Failure reproduces with other passes. Issue https://github.com/llvm/llvm-project/issues/81128 https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
JonChesterfield wrote: Patch run through `clang-tidy --checks=readability-identifier-naming` with the config file in tree and recommendations applied. Some of the choices seem poor but it's presumably acceptable. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,589 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature +; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + +; The types show the call frames +; CHECK: %single_i32.vararg = type <{ i32 }> +; CHECK: %single_double.vararg = type <{ double }> +; CHECK: %single_v4f32.vararg = type <{ <4 x float> }> +; CHECK: %single_v8f32.vararg = type <{ <8 x float> }> +; CHECK: %single_v16f32.vararg = type <{ <16 x float> }> +; CHECK: %single_v32f32.vararg = type <{ <32 x float> }> +; CHECK: %i32_double.vararg = type <{ i32, [4 x i8], double }> +; CHECK: %double_i32.vararg = type <{ double, i32 }> +; CHECK: %i32_v4f32.vararg = type <{ i32, [12 x i8], <4 x float> }> +; CHECK: %v4f32_i32.vararg = type <{ <4 x float>, i32 }> +; CHECK: %i32_v8f32.vararg = type <{ i32, [28 x i8], <8 x float> }> +; CHECK: %v8f32_i32.vararg = type <{ <8 x float>, i32 }> +; CHECK: %i32_v16f32.vararg = type <{ i32, [60 x i8], <16 x float> }> +; CHECK: %v16f32_i32.vararg = type <{ <16 x float>, i32 }> +; CHECK: %i32_v32f32.vararg = type <{ i32, [124 x i8], <32 x float> }> +; CHECK: %v32f32_i32.vararg = type <{ <32 x float>, i32 }> + +%struct.__va_list_tag = type { i32, i32, ptr, ptr } +%struct.libcS = type { i8, i16, i32, i64, float, double } + +define dso_local void @codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%cp = alloca [1 x %struct.__va_list_tag], align 16 +; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #6 +; CHECK-NEXT:call void @llvm.va_copy(ptr nonnull %cp, ptr %x) +; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %cp) #7 +; CHECK-NEXT:call void @llvm.va_end(ptr %cp) +; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #6 +; CHECK-NEXT:ret void +; +entry: + %cp = alloca [1 x %struct.__va_list_tag], align 16 + call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #5 + call void @llvm.va_copy(ptr nonnull %cp, ptr %x) + call void @wrapped(ptr noundef nonnull %cp) #6 + call void @llvm.va_end(ptr %cp) + call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #5 + ret void +} + +declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1 + +declare void @llvm.va_copy(ptr, ptr) #2 + +declare void @wrapped(ptr noundef) local_unnamed_addr #3 + +declare void @llvm.va_end(ptr) #2 + +declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1 + +define dso_local void @vararg(...) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@vararg(...) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%va = alloca [1 x %struct.__va_list_tag], align 16 +; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #6 +; CHECK-NEXT:call void @llvm.va_start(ptr nonnull %va) +; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %va) #7 +; CHECK-NEXT:call void @llvm.va_end(ptr %va) +; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #6 +; CHECK-NEXT:ret void +; +entry: + %va = alloca [1 x %struct.__va_list_tag], align 16 + call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #5 + call void @llvm.va_start(ptr nonnull %va) + call void @wrapped(ptr noundef nonnull %va) #6 + call void @llvm.va_end(ptr %va) + call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #5 + ret void +} + +declare void @llvm.va_start(ptr) #2 + +define dso_local void @single_i32(i32 noundef %x) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@single_i32(i32 noundef %x) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%vararg_buffer = alloca %single_i32.vararg, align 8 +; CHECK-NEXT:%0 = getelementptr inbounds %single_i32.vararg, ptr %vararg_buffer, i32 0, i32 0 +; CHECK-NEXT:store i32 %x, ptr %0, align 4 +; CHECK-NEXT:%va_list = alloca [1 x { i32, i32, ptr, ptr }], align 8 +; CHECK-NEXT:%gp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 0 +; CHECK-NEXT:store i32 48, ptr %gp_offset, align 4 +; CHECK-NEXT:%fp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 1 +; CHECK-NEXT:store i32 176, ptr %fp_offset, align 4 +; CHECK-NEXT:%overfow_arg_area = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 2 +; CHECK-NEXT:store ptr %vararg_buffer, ptr %overfow_arg_area, align 8 +; CHECK-NEXT:%reg_save_area = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 3 +; CHECK-NEXT:store ptr null, ptr %reg_save_area, align 8 +; CHECK-NEXT:call void @wrapped(ptr %va_list) #8 +; CHECK-NEXT:ret void +; +entry: + tail cal
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,589 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature +; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + +; The types show the call frames +; CHECK: %single_i32.vararg = type <{ i32 }> +; CHECK: %single_double.vararg = type <{ double }> +; CHECK: %single_v4f32.vararg = type <{ <4 x float> }> +; CHECK: %single_v8f32.vararg = type <{ <8 x float> }> +; CHECK: %single_v16f32.vararg = type <{ <16 x float> }> +; CHECK: %single_v32f32.vararg = type <{ <32 x float> }> +; CHECK: %i32_double.vararg = type <{ i32, [4 x i8], double }> +; CHECK: %double_i32.vararg = type <{ double, i32 }> +; CHECK: %i32_v4f32.vararg = type <{ i32, [12 x i8], <4 x float> }> +; CHECK: %v4f32_i32.vararg = type <{ <4 x float>, i32 }> +; CHECK: %i32_v8f32.vararg = type <{ i32, [28 x i8], <8 x float> }> +; CHECK: %v8f32_i32.vararg = type <{ <8 x float>, i32 }> +; CHECK: %i32_v16f32.vararg = type <{ i32, [60 x i8], <16 x float> }> +; CHECK: %v16f32_i32.vararg = type <{ <16 x float>, i32 }> +; CHECK: %i32_v32f32.vararg = type <{ i32, [124 x i8], <32 x float> }> +; CHECK: %v32f32_i32.vararg = type <{ <32 x float>, i32 }> + +%struct.__va_list_tag = type { i32, i32, ptr, ptr } +%struct.libcS = type { i8, i16, i32, i64, float, double } + +define dso_local void @codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@codegen_for_copy(ptr noundef %x) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%cp = alloca [1 x %struct.__va_list_tag], align 16 +; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #6 +; CHECK-NEXT:call void @llvm.va_copy(ptr nonnull %cp, ptr %x) +; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %cp) #7 +; CHECK-NEXT:call void @llvm.va_end(ptr %cp) +; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #6 +; CHECK-NEXT:ret void +; +entry: + %cp = alloca [1 x %struct.__va_list_tag], align 16 + call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %cp) #5 + call void @llvm.va_copy(ptr nonnull %cp, ptr %x) + call void @wrapped(ptr noundef nonnull %cp) #6 + call void @llvm.va_end(ptr %cp) + call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %cp) #5 + ret void +} + +declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1 + +declare void @llvm.va_copy(ptr, ptr) #2 + +declare void @wrapped(ptr noundef) local_unnamed_addr #3 + +declare void @llvm.va_end(ptr) #2 + +declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1 + +define dso_local void @vararg(...) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@vararg(...) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%va = alloca [1 x %struct.__va_list_tag], align 16 +; CHECK-NEXT:call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #6 +; CHECK-NEXT:call void @llvm.va_start(ptr nonnull %va) +; CHECK-NEXT:call void @wrapped(ptr noundef nonnull %va) #7 +; CHECK-NEXT:call void @llvm.va_end(ptr %va) +; CHECK-NEXT:call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #6 +; CHECK-NEXT:ret void +; +entry: + %va = alloca [1 x %struct.__va_list_tag], align 16 + call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %va) #5 + call void @llvm.va_start(ptr nonnull %va) + call void @wrapped(ptr noundef nonnull %va) #6 + call void @llvm.va_end(ptr %va) + call void @llvm.lifetime.end.p0(i64 24, ptr nonnull %va) #5 + ret void +} + +declare void @llvm.va_start(ptr) #2 + +define dso_local void @single_i32(i32 noundef %x) local_unnamed_addr #0 { +; CHECK-LABEL: define {{[^@]+}}@single_i32(i32 noundef %x) local_unnamed_addr #0 { +; CHECK-NEXT: entry: +; CHECK-NEXT:%vararg_buffer = alloca %single_i32.vararg, align 8 +; CHECK-NEXT:%0 = getelementptr inbounds %single_i32.vararg, ptr %vararg_buffer, i32 0, i32 0 +; CHECK-NEXT:store i32 %x, ptr %0, align 4 +; CHECK-NEXT:%va_list = alloca [1 x { i32, i32, ptr, ptr }], align 8 +; CHECK-NEXT:%gp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 0 +; CHECK-NEXT:store i32 48, ptr %gp_offset, align 4 +; CHECK-NEXT:%fp_offset = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 1 +; CHECK-NEXT:store i32 176, ptr %fp_offset, align 4 +; CHECK-NEXT:%overfow_arg_area = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 2 +; CHECK-NEXT:store ptr %vararg_buffer, ptr %overfow_arg_area, align 8 +; CHECK-NEXT:%reg_save_area = getelementptr inbounds [1 x { i32, i32, ptr, ptr }], ptr %va_list, i64 0, i32 0, i32 3 +; CHECK-NEXT:store ptr null, ptr %reg_save_area, align 8 +; CHECK-NEXT:call void @wrapped(ptr %va_list) #8 +; CHECK-NEXT:ret void +; +entry: + tail cal
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); JonChesterfield wrot
[clang] [llvm] [LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (PR #81331)
JonChesterfield wrote: New intrinsic sounds right - a constant frequency counter is a different thing to a variable frequency counter. "Steady" implies unchanging, so I'd agree with `readfixedfreqtimer` or similar. We can't have a ratio between the two counters since one changes frequency and one doesn't. Does x64 have something that maps usefully onto a fixed frequency counter intrinsic? https://github.com/llvm/llvm-project/pull/81331 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)
JonChesterfield wrote: The "generic IR" thing is more emergent behaviour than a documented / intentional design. This patch is fine - we shouldn't set macros to nonsense values - but if this is a step towards building libc like the rocm-device-libs there may be push back on that one. https://github.com/llvm/llvm-project/pull/79660 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)
https://github.com/JonChesterfield approved this pull request. https://github.com/llvm/llvm-project/pull/79660 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][AMDGPU] Do not include 'ockl' implementations in OpenMP (PR #70462)
https://github.com/JonChesterfield approved this pull request. https://github.com/llvm/llvm-project/pull/70462 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] bcaa806 - [Clang] Fix BZ47169, loader_uninitialized on incomplete types
Author: Jon Chesterfield Date: 2020-08-19T18:11:50+01:00 New Revision: bcaa806a4747595116b538e8b75b12966e6607f6 URL: https://github.com/llvm/llvm-project/commit/bcaa806a4747595116b538e8b75b12966e6607f6 DIFF: https://github.com/llvm/llvm-project/commit/bcaa806a4747595116b538e8b75b12966e6607f6.diff LOG: [Clang] Fix BZ47169, loader_uninitialized on incomplete types [Clang] Fix BZ47169, loader_uninitialized on incomplete types Reported by @erichkeane. Fix proposed by @erichkeane works, tests included. Bug introduced in D74361. Crash was on querying a CXXRecordDecl for hasTrivialDefaultConstructor on an incomplete type. Fixed by calling RequireCompleteType in the right place. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D85990 Added: Modified: clang/lib/Sema/SemaDecl.cpp clang/test/CodeGenCXX/attr-loader-uninitialized.cpp clang/test/Sema/attr-loader-uninitialized.c clang/test/Sema/attr-loader-uninitialized.cpp Removed: diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp index ab1496337210..566a2f9da681 100644 --- a/clang/lib/Sema/SemaDecl.cpp +++ b/clang/lib/Sema/SemaDecl.cpp @@ -12476,6 +12476,17 @@ void Sema::ActOnUninitializedDecl(Decl *RealDecl) { } if (!Var->isInvalidDecl() && RealDecl->hasAttr()) { + if (Var->getStorageClass() == SC_Extern) { +Diag(Var->getLocation(), diag::err_loader_uninitialized_extern_decl) +<< Var; +Var->setInvalidDecl(); +return; + } + if (RequireCompleteType(Var->getLocation(), Var->getType(), + diag::err_typecheck_decl_incomplete_type)) { +Var->setInvalidDecl(); +return; + } if (CXXRecordDecl *RD = Var->getType()->getAsCXXRecordDecl()) { if (!RD->hasTrivialDefaultConstructor()) { Diag(Var->getLocation(), diag::err_loader_uninitialized_trivial_ctor); @@ -12483,12 +12494,6 @@ void Sema::ActOnUninitializedDecl(Decl *RealDecl) { return; } } - if (Var->getStorageClass() == SC_Extern) { -Diag(Var->getLocation(), diag::err_loader_uninitialized_extern_decl) -<< Var; -Var->setInvalidDecl(); -return; - } } VarDecl::DefinitionKind DefKind = Var->isThisDeclarationADefinition(); diff --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp index e82ae47e9f16..6501a25bf5bc 100644 --- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp +++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp @@ -28,3 +28,15 @@ double arr[32] __attribute__((loader_uninitialized)); // Defining as arr2[] [[clang..]] raises the error: attribute cannot be applied to types // CHECK: @arr2 = global [4 x double] undef double arr2 [[clang::loader_uninitialized]] [4]; + +template struct templ{T * t;}; + +// CHECK: @templ_int = global %struct.templ undef, align 8 +templ templ_int [[clang::loader_uninitialized]]; + +// CHECK: @templ_trivial = global %struct.templ.0 undef, align 8 +templ templ_trivial [[clang::loader_uninitialized]]; + +// CHECK: @templ_incomplete = global %struct.templ.1 undef, align 8 +struct incomplete; +templ templ_incomplete [[clang::loader_uninitialized]]; diff --git a/clang/test/Sema/attr-loader-uninitialized.c b/clang/test/Sema/attr-loader-uninitialized.c index f2e78d981580..a1edd858e27f 100644 --- a/clang/test/Sema/attr-loader-uninitialized.c +++ b/clang/test/Sema/attr-loader-uninitialized.c @@ -10,6 +10,10 @@ const int can_still_be_const __attribute__((loader_uninitialized)); extern int external_rejected __attribute__((loader_uninitialized)); // expected-error@-1 {{variable 'external_rejected' cannot be declared both 'extern' and with the 'loader_uninitialized' attribute}} +struct S; +extern struct S incomplete_external_rejected __attribute__((loader_uninitialized)); +// expected-error@-1 {{variable 'incomplete_external_rejected' cannot be declared both 'extern' and with the 'loader_uninitialized' attribute}} + int noargs __attribute__((loader_uninitialized(0))); // expected-error@-1 {{'loader_uninitialized' attribute takes no arguments}} @@ -35,3 +39,8 @@ __private_extern__ int initialized_private_extern_rejected __attribute__((loader extern __attribute__((visibility("hidden"))) int extern_hidden __attribute__((loader_uninitialized)); // expected-error@-1 {{variable 'extern_hidden' cannot be declared both 'extern' and with the 'loader_uninitialized' attribute}} + +struct Incomplete; +struct Incomplete incomplete __attribute__((loader_uninitialized)); +// expected-error@-1 {{variable has incomplete type 'struct Incomplete'}} +// expected-note@-3 {{forward declaration of 'struct Incomplete'}} diff --git a/clang/test/Sema/attr-loader-uninitialized.cpp b/clang/test/Sema/attr-loader-uninitialized.cpp index 3a330b3d5965..5
[clang] 4b2e7d0 - [amdgpu] Default to code object v3
Author: Jon Chesterfield Date: 2020-12-15T01:11:09Z New Revision: 4b2e7d0215021d0d1df1a6319884b21d33936265 URL: https://github.com/llvm/llvm-project/commit/4b2e7d0215021d0d1df1a6319884b21d33936265 DIFF: https://github.com/llvm/llvm-project/commit/4b2e7d0215021d0d1df1a6319884b21d33936265.diff LOG: [amdgpu] Default to code object v3 [amdgpu] Default to code object v3 v4 is not yet readily available, and doesn't appear to be implemented in the back end Reviewed By: t-tye Differential Revision: https://reviews.llvm.org/D93258 Added: Modified: clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/CommonArgs.cpp llvm/docs/AMDGPUUsage.rst Removed: diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 67d41c3711f5..87c786065fa9 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -2811,7 +2811,7 @@ def mexec_model_EQ : Joined<["-"], "mexec-model=">, Group; def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, - HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, + HelpText<"Specify code object ABI version. Defaults to 3. (AMDGPU only)">, MetaVarName<"">, Values<"2,3,4">; def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group, diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 72bedc16846d..04d0e0771f70 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1549,7 +1549,7 @@ unsigned tools::getOrCheckAMDGPUCodeObjectVersion( const Driver &D, const llvm::opt::ArgList &Args, bool Diagnose) { const unsigned MinCodeObjVer = 2; const unsigned MaxCodeObjVer = 4; - unsigned CodeObjVer = 4; + unsigned CodeObjVer = 3; // Emit warnings for legacy options even if they are overridden. if (Diagnose) { diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index e5d081a37500..95fb164310cc 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -911,12 +911,12 @@ The AMDGPU backend uses the following ELF header: * ``ELFABIVERSION_AMDGPU_HSA_V3`` is used to specify the version of AMD HSA runtime ABI for code object V3. Specify using the Clang option -``-mcode-object-version=3``. +``-mcode-object-version=3``. This is the default code object +version if not specified. * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA runtime ABI for code object V4. Specify using the Clang option -``-mcode-object-version=4``. This is the default code object -version if not specified. +``-mcode-object-version=4``. * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL runtime ABI. @@ -2871,10 +2871,6 @@ non-AMD key names should be prefixed by "*vendor-name*.". Code Object V3 Metadata +++ -.. warning:: - Code object V3 is not the default code object version emitted by this version - of LLVM. - Code object V3 to V4 metadata is specified by the ``NT_AMDGPU_METADATA`` note record (see :ref:`amdgpu-note-records-v3-v4`). @@ -3279,6 +3275,10 @@ same *vendor-name*. Code Object V4 Metadata +++ +.. warning:: + Code object V4 is not the default code object version emitted by this version + of LLVM. + Code object V4 metadata is the same as :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3`. ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] c0619d3 - [NFC] Use regex for code object version in hip tests
Author: Jon Chesterfield Date: 2020-12-16T17:00:19Z New Revision: c0619d3b21cd420b9faf15f14db0816787c44ded URL: https://github.com/llvm/llvm-project/commit/c0619d3b21cd420b9faf15f14db0816787c44ded DIFF: https://github.com/llvm/llvm-project/commit/c0619d3b21cd420b9faf15f14db0816787c44ded.diff LOG: [NFC] Use regex for code object version in hip tests [NFC] Use regex for code object version in hip tests Extracted from D93258. Makes tests robust to changes in default code object version. Reviewed By: t-tye Differential Revision: https://reviews.llvm.org/D93398 Added: Modified: clang/test/Driver/hip-autolink.hip clang/test/Driver/hip-code-object-version.hip clang/test/Driver/hip-device-compile.hip clang/test/Driver/hip-host-cpu-features.hip clang/test/Driver/hip-rdc-device-only.hip clang/test/Driver/hip-target-id.hip clang/test/Driver/hip-toolchain-mllvm.hip clang/test/Driver/hip-toolchain-no-rdc.hip clang/test/Driver/hip-toolchain-opt.hip clang/test/Driver/hip-toolchain-rdc-separate.hip clang/test/Driver/hip-toolchain-rdc-static-lib.hip clang/test/Driver/hip-toolchain-rdc.hip Removed: diff --git a/clang/test/Driver/hip-autolink.hip b/clang/test/Driver/hip-autolink.hip index 073c6c4d244a6..5f9311d7ba734 100644 --- a/clang/test/Driver/hip-autolink.hip +++ b/clang/test/Driver/hip-autolink.hip @@ -7,7 +7,7 @@ // RUN: %clang --target=i386-pc-windows-msvc --cuda-gpu-arch=gfx906 -nogpulib \ // RUN: --cuda-host-only %s -### 2>&1 | FileCheck --check-prefix=HOST %s -// DEV: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" "amdgcn-amd-amdhsa" +// DEV: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" "-triple" "amdgcn-amd-amdhsa" // DEV-SAME: "-fno-autolink" // HOST: "-cc1" "-triple" "i386-pc-windows-msvc{{.*}}" diff --git a/clang/test/Driver/hip-code-object-version.hip b/clang/test/Driver/hip-code-object-version.hip index 26ad6f8710cc2..51d9004b0cbf5 100644 --- a/clang/test/Driver/hip-code-object-version.hip +++ b/clang/test/Driver/hip-code-object-version.hip @@ -44,12 +44,17 @@ // RUN: --offload-arch=gfx906 -nogpulib \ // RUN: %s 2>&1 | FileCheck -check-prefix=V4 %s +// V4: "-mllvm" "--amdhsa-code-object-version=4" +// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906" + +// Check bundle ID for code object version default + // RUN: %clang -### -target x86_64-linux-gnu \ // RUN: --offload-arch=gfx906 -nogpulib \ -// RUN: %s 2>&1 | FileCheck -check-prefix=V4 %s +// RUN: %s 2>&1 | FileCheck -check-prefix=VD %s -// V4: "-mllvm" "--amdhsa-code-object-version=4" -// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906" +// VD: "-mllvm" "--amdhsa-code-object-version=4" +// VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906" // Check invalid code object version option. diff --git a/clang/test/Driver/hip-device-compile.hip b/clang/test/Driver/hip-device-compile.hip index 5fbcbc97bd805..c460ff7e8c67d 100644 --- a/clang/test/Driver/hip-device-compile.hip +++ b/clang/test/Driver/hip-device-compile.hip @@ -26,7 +26,7 @@ // RUN: %S/Inputs/hip_multiple_inputs/a.cu \ // RUN: 2>&1 | FileCheck -check-prefixes=CHECK,ASM %s -// CHECK: {{".*clang.*"}} "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" "amdgcn-amd-amdhsa" +// CHECK: {{".*clang.*"}} "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" "-triple" "amdgcn-amd-amdhsa" // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu" // BC-SAME: "-emit-llvm-bc" // LL-SAME: "-emit-llvm" diff --git a/clang/test/Driver/hip-host-cpu-features.hip b/clang/test/Driver/hip-host-cpu-features.hip index 235f0f1f22c24..8addfb11dc0b6 100644 --- a/clang/test/Driver/hip-host-cpu-features.hip +++ b/clang/test/Driver/hip-host-cpu-features.hip @@ -6,14 +6,14 @@ // RUN: %clang -### -c -target x86_64-linux-gnu -msse3 --cuda-gpu-arch=gfx803 -nogpulib %s 2>&1 | FileCheck %s -check-prefix=HOSTSSE3 // RUN: %clang -### -c -target x86_64-linux-gnu --gpu-use-aux-triple-only -march=znver2 --cuda-gpu-arch=gfx803 -nogpulib %s 2>&1 | FileCheck %s -check-prefix=NOHOSTCPU -// HOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" "amdgcn-amd-amdhsa" +// HOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" "-triple" "amdgcn-amd-amdhsa" // HOSTCPU-SAME: "-aux-triple" "x86_64-unknown-linux-gnu" // HOSTCPU-SAME: "-aux-target-cpu" "znver2" -// HOSTSSE3: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" "amdgcn-amd-amdhsa" +// HOSTSSE3: "-cc1" "-mllvm" "--amdhsa-code-object-version={{[0-9]+}}" "-triple" "amdgcn-amd-amdhsa" // HOSTSSE3-SAME: "-aux-triple" "x86_64-unknown-linux-gnu" // HOSTSSE3-SAME: "-aux-target-feature" "+sse3" -// NOHOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-version=4" "-triple" "amdgcn-amd-amdhsa" +// NOHOSTCPU: "-cc1" "-mllvm" "--amdhsa-code-object-versio
[clang] daf39e3 - [amdgpu] Default to code object v3
Author: Jon Chesterfield Date: 2020-12-17T16:09:33Z New Revision: daf39e3f2dba18bd39cd89a1c91bae126a31d4fe URL: https://github.com/llvm/llvm-project/commit/daf39e3f2dba18bd39cd89a1c91bae126a31d4fe DIFF: https://github.com/llvm/llvm-project/commit/daf39e3f2dba18bd39cd89a1c91bae126a31d4fe.diff LOG: [amdgpu] Default to code object v3 [amdgpu] Default to code object v3 v4 is not yet readily available, and doesn't appear to be implemented in the back end Reviewed By: t-tye, yaxunl Differential Revision: https://reviews.llvm.org/D93258 Added: Modified: clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/CommonArgs.cpp clang/test/Driver/hip-code-object-version.hip llvm/docs/AMDGPUUsage.rst Removed: diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index f384e0d993c2..07f15add28ec 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -2909,7 +2909,7 @@ def mexec_model_EQ : Joined<["-"], "mexec-model=">, Group; def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, - HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, + HelpText<"Specify code object ABI version. Defaults to 3. (AMDGPU only)">, MetaVarName<"">, Values<"2,3,4">; def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group, diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index 72bedc16846d..04d0e0771f70 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1549,7 +1549,7 @@ unsigned tools::getOrCheckAMDGPUCodeObjectVersion( const Driver &D, const llvm::opt::ArgList &Args, bool Diagnose) { const unsigned MinCodeObjVer = 2; const unsigned MaxCodeObjVer = 4; - unsigned CodeObjVer = 4; + unsigned CodeObjVer = 3; // Emit warnings for legacy options even if they are overridden. if (Diagnose) { diff --git a/clang/test/Driver/hip-code-object-version.hip b/clang/test/Driver/hip-code-object-version.hip index 51d9004b0cbf..6e4e96688593 100644 --- a/clang/test/Driver/hip-code-object-version.hip +++ b/clang/test/Driver/hip-code-object-version.hip @@ -53,7 +53,7 @@ // RUN: --offload-arch=gfx906 -nogpulib \ // RUN: %s 2>&1 | FileCheck -check-prefix=VD %s -// VD: "-mllvm" "--amdhsa-code-object-version=4" +// VD: "-mllvm" "--amdhsa-code-object-version=3" // VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906" // Check invalid code object version option. diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 6d3fa7021a7a..c8dda47352ab 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -911,12 +911,12 @@ The AMDGPU backend uses the following ELF header: * ``ELFABIVERSION_AMDGPU_HSA_V3`` is used to specify the version of AMD HSA runtime ABI for code object V3. Specify using the Clang option -``-mcode-object-version=3``. +``-mcode-object-version=3``. This is the default code object +version if not specified. * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA runtime ABI for code object V4. Specify using the Clang option -``-mcode-object-version=4``. This is the default code object -version if not specified. +``-mcode-object-version=4``. * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL runtime ABI. @@ -2871,10 +2871,6 @@ non-AMD key names should be prefixed by "*vendor-name*.". Code Object V3 Metadata +++ -.. warning:: - Code object V3 is not the default code object version emitted by this version - of LLVM. - Code object V3 to V4 metadata is specified by the ``NT_AMDGPU_METADATA`` note record (see :ref:`amdgpu-note-records-v3-v4`). @@ -3279,6 +3275,10 @@ same *vendor-name*. Code Object V4 Metadata +++ +.. warning:: + Code object V4 is not the default code object version emitted by this version + of LLVM. + Code object V4 metadata is the same as :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3`. ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 5d02ca4 - [libomptarget][nvptx] Undef, weak shared variables
Author: JonChesterfield Date: 2020-10-28T14:25:36Z New Revision: 5d02ca49a294848b533adf7dc1d1275d125ef587 URL: https://github.com/llvm/llvm-project/commit/5d02ca49a294848b533adf7dc1d1275d125ef587 DIFF: https://github.com/llvm/llvm-project/commit/5d02ca49a294848b533adf7dc1d1275d125ef587.diff LOG: [libomptarget][nvptx] Undef, weak shared variables [libomptarget][nvptx] Undef, weak shared variables Shared variables on nvptx, and LDS on amdgcn, are uninitialized at the start of kernel execution. Therefore create the variables with undef instead of zeros, motivated in part by the amdgcn back end rejecting LDS+initializer. Common is zero initialized, which seems incompatible with shared. Thus change them to weak, following the direction of https://reviews.llvm.org/rG7b3eabdcd215 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90248 Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/OpenMP/nvptx_data_sharing.cpp clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp clang/test/OpenMP/nvptx_parallel_codegen.cpp clang/test/OpenMP/nvptx_parallel_for_codegen.cpp clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp clang/test/OpenMP/nvptx_teams_codegen.cpp clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index bcabc5398127..08903a1444c2 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1102,7 +1102,7 @@ void CGOpenMPRuntimeGPU::emitNonSPMDKernel(const OMPExecutableDirective &D, KernelStaticGlobalized = new llvm::GlobalVariable( CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/false, llvm::GlobalValue::InternalLinkage, -llvm::ConstantPointerNull::get(CGM.VoidPtrTy), +llvm::UndefValue::get(CGM.VoidPtrTy), "_openmp_kernel_static_glob_rd$ptr", /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared)); @@ -1234,7 +1234,7 @@ void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D, KernelStaticGlobalized = new llvm::GlobalVariable( CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/false, llvm::GlobalValue::InternalLinkage, -llvm::ConstantPointerNull::get(CGM.VoidPtrTy), +llvm::UndefValue::get(CGM.VoidPtrTy), "_openmp_kernel_static_glob_rd$ptr", /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared)); @@ -2855,8 +2855,8 @@ static llvm::Value *emitInterWarpCopyFunction(CodeGenModule &CGM, auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize); unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared); TransferMedium = new llvm::GlobalVariable( -M, Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage, -llvm::Constant::getNullValue(Ty), TransferMediumName, +M, Ty, /*isConstant=*/false, llvm::GlobalVariable::WeakAnyLinkage, +llvm::UndefValue::get(Ty), TransferMediumName, /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal, SharedAddressSpace); CGM.addCompilerUsedGlobal(TransferMedium); @@ -4791,8 +4791,8 @@ void CGOpenMPRuntimeGPU::clear() { llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy); auto *GV = new llvm::GlobalVariable( CGM.getModule(), LLVMStaticTy, - /*isConstant=*/false, llvm::GlobalValue::CommonLinkage, - llvm::Constant::getNullValue(LLVMStaticTy), + /*isConstant=*/false, llvm::GlobalValue::WeakAnyLinkage, + llvm::UndefValue::get(LLVMStaticTy), "_openmp_shared_static_glob_rd_$_", /*InsertBefore=*/nullptr, llvm::GlobalValue::NotThreadLocal, C.getTargetAddressSpace(LangAS::cuda_shared)); diff --git a/clang/test/OpenMP/nvptx_data_sharing.cpp b/clang/test/OpenMP/nvptx_data_sharing.cpp index 1372246c7fc8..b6117d738d2b 100644 --- a/clang/test/OpenMP/nvptx_data_sharing.cpp +++ b/clang/test/OpenMP/nvptx_data_sharing.cpp @@ -28,8 +28,8 @@ void test_ds(){ } } // SEQ: [[MEM_TY:%.+]] = type { [128 x i8] } -// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] zeroinitializer -// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null +// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = weak addrspace(3) global [[MEM_TY]] undef +// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* undef // SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i64 8 // SEQ-DAG: [[KERNE
[clang] dee7704 - [AMDGPU] Add __builtin_amdgcn_grid_size
Author: Jon Chesterfield Date: 2020-10-29T16:25:13Z New Revision: dee7704829bd421ad3cce4b2132d28f4459b7319 URL: https://github.com/llvm/llvm-project/commit/dee7704829bd421ad3cce4b2132d28f4459b7319 DIFF: https://github.com/llvm/llvm-project/commit/dee7704829bd421ad3cce4b2132d28f4459b7319.diff LOG: [AMDGPU] Add __builtin_amdgcn_grid_size [AMDGPU] Add __builtin_amdgcn_grid_size Similar to D76772, loads the data from the dispatch pointer. Marked invariant. Patch also updates the openmp devicertl to use this builtin. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D90251 Added: Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGenOpenCL/builtins-amdgcn.cl openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 042a86368559..f5901e6f8f3b 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -37,6 +37,10 @@ BUILTIN(__builtin_amdgcn_workgroup_size_x, "Us", "nc") BUILTIN(__builtin_amdgcn_workgroup_size_y, "Us", "nc") BUILTIN(__builtin_amdgcn_workgroup_size_z, "Us", "nc") +BUILTIN(__builtin_amdgcn_grid_size_x, "Ui", "nc") +BUILTIN(__builtin_amdgcn_grid_size_y, "Ui", "nc") +BUILTIN(__builtin_amdgcn_grid_size_z, "Ui", "nc") + BUILTIN(__builtin_amdgcn_mbcnt_hi, "UiUiUi", "nc") BUILTIN(__builtin_amdgcn_mbcnt_lo, "UiUiUi", "nc") diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 6f7505b7b5c2..f933113fa883 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -14750,6 +14750,22 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) { llvm::MDNode::get(CGF.getLLVMContext(), None)); return LD; } + +// \p Index is 0, 1, and 2 for x, y, and z dimension, respectively. +Value *EmitAMDGPUGridSize(CodeGenFunction &CGF, unsigned Index) { + const unsigned XOffset = 12; + auto *DP = EmitAMDGPUDispatchPtr(CGF); + // Indexing the HSA kernel_dispatch_packet struct. + auto *Offset = llvm::ConstantInt::get(CGF.Int32Ty, XOffset + Index * 4); + auto *GEP = CGF.Builder.CreateGEP(DP, Offset); + auto *DstTy = + CGF.Int32Ty->getPointerTo(GEP->getType()->getPointerAddressSpace()); + auto *Cast = CGF.Builder.CreateBitCast(GEP, DstTy); + auto *LD = CGF.Builder.CreateLoad(Address(Cast, CharUnits::fromQuantity(4))); + LD->setMetadata(llvm::LLVMContext::MD_invariant_load, + llvm::MDNode::get(CGF.getLLVMContext(), None)); + return LD; +} } // namespace // For processing memory ordering and memory scope arguments of various @@ -15010,6 +15026,14 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_workgroup_size_z: return EmitAMDGPUWorkGroupSize(*this, 2); + // amdgcn grid size + case AMDGPU::BI__builtin_amdgcn_grid_size_x: +return EmitAMDGPUGridSize(*this, 0); + case AMDGPU::BI__builtin_amdgcn_grid_size_y: +return EmitAMDGPUGridSize(*this, 1); + case AMDGPU::BI__builtin_amdgcn_grid_size_z: +return EmitAMDGPUGridSize(*this, 2); + // r600 intrinsics case AMDGPU::BI__builtin_r600_recipsqrt_ieee: case AMDGPU::BI__builtin_r600_recipsqrt_ieeef: diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl index 56c83df6b6b4..20edaf2aae3f 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl @@ -559,6 +559,24 @@ void test_get_workgroup_size(int d, global int *out) } } +// CHECK-LABEL: @test_get_grid_size( +// CHECK: call align 4 dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() +// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 12 +// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load +// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 16 +// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load +// CHECK: getelementptr i8, i8 addrspace(4)* %{{.*}}, i64 20 +// CHECK: load i32, i32 addrspace(4)* %{{.*}}, align 4, !invariant.load +void test_get_grid_size(int d, global int *out) +{ + switch (d) { + case 0: *out = __builtin_amdgcn_grid_size_x(); break; + case 1: *out = __builtin_amdgcn_grid_size_y(); break; + case 2: *out = __builtin_amdgcn_grid_size_z(); break; + default: *out = 0; + } +} + // CHECK-LABEL: @test_fmed3_f32 // CHECK: call float @llvm.amdgcn.fmed3.f32( void test_fmed3_f32(global float* out, float a, float b, float c) diff --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip index 8c53d99b9fb6..9fbdc67b56ab 100644 --- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.
[clang] 5dfdc18 - [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.
Author: Ethan Stewart Date: 2021-06-23T15:26:09+01:00 New Revision: 5dfdc1812d9b9c043204d39318f6446424d8f2d7 URL: https://github.com/llvm/llvm-project/commit/5dfdc1812d9b9c043204d39318f6446424d8f2d7 DIFF: https://github.com/llvm/llvm-project/commit/5dfdc1812d9b9c043204d39318f6446424d8f2d7.diff LOG: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn. This fixes issues with various return types(bool/int) and was already in place for nvptx headers, adjusted to work for amdgcn. This does not affect hip as the change is guarded with OPENMP_AMDGCN. Similar to D85879. Reviewed By: jdoerfert, JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D104677 Added: Modified: clang/lib/Headers/__clang_hip_cmath.h clang/test/Headers/hip-header.hip clang/test/Headers/openmp_device_math_isnan.cpp Removed: diff --git a/clang/lib/Headers/__clang_hip_cmath.h b/clang/lib/Headers/__clang_hip_cmath.h index b5d7c16ac5e41..7342705434e6b 100644 --- a/clang/lib/Headers/__clang_hip_cmath.h +++ b/clang/lib/Headers/__clang_hip_cmath.h @@ -52,8 +52,46 @@ __DEVICE__ int fpclassify(double __x) { __DEVICE__ float frexp(float __arg, int *__exp) { return ::frexpf(__arg, __exp); } + +#if defined(__OPENMP_AMDGCN__) +// For OpenMP we work around some old system headers that have non-conforming +// `isinf(float)` and `isnan(float)` implementations that return an `int`. We do +// this by providing two versions of these functions, diff ering only in the +// return type. To avoid conflicting definitions we disable implicit base +// function generation. That means we will end up with two specializations, one +// per type, but only one has a base function defined by the system header. +#pragma omp begin declare variant match( \ +implementation = {extension(disable_implicit_base)}) + +// FIXME: We lack an extension to customize the mangling of the variants, e.g., +//add a suffix. This means we would clash with the names of the variants +//(note that we do not create implicit base functions here). To avoid +//this clash we add a new trait to some of them that is always true +//(this is LLVM after all ;)). It will only influence the mangled name +//of the variants inside the inner region and avoid the clash. +#pragma omp begin declare variant match(implementation = {vendor(llvm)}) + +__DEVICE__ int isinf(float __x) { return ::__isinff(__x); } +__DEVICE__ int isinf(double __x) { return ::__isinf(__x); } +__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); } +__DEVICE__ int isfinite(double __x) { return ::__finite(__x); } +__DEVICE__ int isnan(float __x) { return ::__isnanf(__x); } +__DEVICE__ int isnan(double __x) { return ::__isnan(__x); } + +#pragma omp end declare variant +#endif // defined(__OPENMP_AMDGCN__) + +__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); } +__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); } __DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); } __DEVICE__ bool isfinite(double __x) { return ::__finite(__x); } +__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); } +__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); } + +#if defined(__OPENMP_AMDGCN__) +#pragma omp end declare variant +#endif // defined(__OPENMP_AMDGCN__) + __DEVICE__ bool isgreater(float __x, float __y) { return __builtin_isgreater(__x, __y); } @@ -66,8 +104,6 @@ __DEVICE__ bool isgreaterequal(float __x, float __y) { __DEVICE__ bool isgreaterequal(double __x, double __y) { return __builtin_isgreaterequal(__x, __y); } -__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); } -__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); } __DEVICE__ bool isless(float __x, float __y) { return __builtin_isless(__x, __y); } @@ -86,8 +122,6 @@ __DEVICE__ bool islessgreater(float __x, float __y) { __DEVICE__ bool islessgreater(double __x, double __y) { return __builtin_islessgreater(__x, __y); } -__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); } -__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); } __DEVICE__ bool isnormal(float __x) { return __builtin_isnormal(__x); } __DEVICE__ bool isnormal(double __x) { return __builtin_isnormal(__x); } __DEVICE__ bool isunordered(float __x, float __y) { diff --git a/clang/test/Headers/hip-header.hip b/clang/test/Headers/hip-header.hip index 5bba1de2ce6c4..0e95d58d55700 100644 --- a/clang/test/Headers/hip-header.hip +++ b/clang/test/Headers/hip-header.hip @@ -8,6 +8,20 @@ // RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \ // RUN: -internal-isystem %S/../../lib/Headers/cuda_wrappers \ // RUN: -internal-isystem %S/Inputs/include \ +// RUN: -include cmath \ +// RUN: -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown \ +// RUN: -target-cpu gfx906 -emit-llvm %s -f
[clang] 7f97dda - Revert "[OpenMP][AMDGCN] Initial math headers support"
Author: Jon Chesterfield Date: 2021-07-30T22:07:00+01:00 New Revision: 7f97ddaf8aa0062393e866b63e68c9f74da375fb URL: https://github.com/llvm/llvm-project/commit/7f97ddaf8aa0062393e866b63e68c9f74da375fb DIFF: https://github.com/llvm/llvm-project/commit/7f97ddaf8aa0062393e866b63e68c9f74da375fb.diff LOG: Revert "[OpenMP][AMDGCN] Initial math headers support" Broke nvptx compilation on files including This reverts commit 12da97ea10a941f0123340831300d09a2121e173. Added: Modified: clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Headers/__clang_hip_cmath.h clang/lib/Headers/__clang_hip_math.h clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h clang/lib/Headers/openmp_wrappers/cmath clang/lib/Headers/openmp_wrappers/math.h clang/test/Headers/Inputs/include/cstdlib clang/test/Headers/openmp_device_math_isnan.cpp Removed: clang/test/Headers/Inputs/include/algorithm clang/test/Headers/Inputs/include/utility clang/test/Headers/amdgcn_openmp_device_math.c diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 278ae118563d6..e13302528cbd1 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1256,8 +1256,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, // If we are offloading to a target via OpenMP we need to include the // openmp_wrappers folder which contains alternative system headers. if (JA.isDeviceOffloading(Action::OFK_OpenMP) && - (getToolChain().getTriple().isNVPTX() || - getToolChain().getTriple().isAMDGCN())) { + getToolChain().getTriple().isNVPTX()){ if (!Args.hasArg(options::OPT_nobuiltininc)) { // Add openmp_wrappers/* to our system include path. This lets us wrap // standard library headers. diff --git a/clang/lib/Headers/__clang_hip_cmath.h b/clang/lib/Headers/__clang_hip_cmath.h index d488db0a94d9d..7342705434e6b 100644 --- a/clang/lib/Headers/__clang_hip_cmath.h +++ b/clang/lib/Headers/__clang_hip_cmath.h @@ -10,7 +10,7 @@ #ifndef __CLANG_HIP_CMATH_H__ #define __CLANG_HIP_CMATH_H__ -#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__) +#if !defined(__HIP__) #error "This file is for HIP and OpenMP AMDGCN device compilation only." #endif @@ -25,43 +25,31 @@ #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") -#pragma push_macro("__CONSTEXPR__") -#ifdef __OPENMP_AMDGCN__ -#define __DEVICE__ static __attribute__((always_inline, nothrow)) -#define __CONSTEXPR__ constexpr -#else #define __DEVICE__ static __device__ inline __attribute__((always_inline)) -#define __CONSTEXPR__ -#endif // __OPENMP_AMDGCN__ // Start with functions that cannot be defined by DEF macros below. #if defined(__cplusplus) -#if defined __OPENMP_AMDGCN__ -__DEVICE__ __CONSTEXPR__ float fabs(float __x) { return ::fabsf(__x); } -__DEVICE__ __CONSTEXPR__ float sin(float __x) { return ::sinf(__x); } -__DEVICE__ __CONSTEXPR__ float cos(float __x) { return ::cosf(__x); } -#endif -__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); } -__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); } -__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); } -__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); } -__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) { +__DEVICE__ double abs(double __x) { return ::fabs(__x); } +__DEVICE__ float abs(float __x) { return ::fabsf(__x); } +__DEVICE__ long long abs(long long __n) { return ::llabs(__n); } +__DEVICE__ long abs(long __n) { return ::labs(__n); } +__DEVICE__ float fma(float __x, float __y, float __z) { return ::fmaf(__x, __y, __z); } #if !defined(__HIPCC_RTC__) // The value returned by fpclassify is platform dependent, therefore it is not // supported by hipRTC. -__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) { +__DEVICE__ int fpclassify(float __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } -__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) { +__DEVICE__ int fpclassify(double __x) { return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x); } #endif // !defined(__HIPCC_RTC__) -__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) { +__DEVICE__ float frexp(float __arg, int *__exp) { return ::frexpf(__arg, __exp); } @@ -83,101 +71,93 @@ __DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) { //of the variants inside the inner region and avoid the clash. #pragma omp begin declare variant match(implementation = {vendor(llvm)}) -__DEVICE__ __CONSTEXPR__ int isinf(float __x) { return ::__isinff(__x); } -__DEVICE__ __CONSTEXPR__ int isinf(double __x) { return ::__isinf(__x);
[clang] 509854b - [clang] Replace asm with __asm__ in cuda header
Author: Jon Chesterfield Date: 2021-08-05T18:46:57+01:00 New Revision: 509854b69cea0c9261ac21ceb22012a53e7a800b URL: https://github.com/llvm/llvm-project/commit/509854b69cea0c9261ac21ceb22012a53e7a800b DIFF: https://github.com/llvm/llvm-project/commit/509854b69cea0c9261ac21ceb22012a53e7a800b.diff LOG: [clang] Replace asm with __asm__ in cuda header Asm is a gnu extension for C, so at present -fopenmp -std=c99 and similar fail to compile on nvptx, bug 51344 Changing to `__asm__` or `__asm` works for openmp, all three appear to work for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different syntax, so this should make for better error diagnostics if the header is passed to a compiler other than clang. Reviewed By: tra, emankov Differential Revision: https://reviews.llvm.org/D107492 Added: Modified: clang/lib/Headers/__clang_cuda_device_functions.h Removed: diff --git a/clang/lib/Headers/__clang_cuda_device_functions.h b/clang/lib/Headers/__clang_cuda_device_functions.h index f801e5426aa43..cc4e1a4dd96ad 100644 --- a/clang/lib/Headers/__clang_cuda_device_functions.h +++ b/clang/lib/Headers/__clang_cuda_device_functions.h @@ -34,10 +34,12 @@ __DEVICE__ unsigned long long __brevll(unsigned long long __a) { return __nv_brevll(__a); } #if defined(__cplusplus) -__DEVICE__ void __brkpt() { asm volatile("brkpt;"); } +__DEVICE__ void __brkpt() { __asm__ __volatile__("brkpt;"); } __DEVICE__ void __brkpt(int __a) { __brkpt(); } #else -__DEVICE__ void __attribute__((overloadable)) __brkpt(void) { asm volatile("brkpt;"); } +__DEVICE__ void __attribute__((overloadable)) __brkpt(void) { + __asm__ __volatile__("brkpt;"); +} __DEVICE__ void __attribute__((overloadable)) __brkpt(int __a) { __brkpt(); } #endif __DEVICE__ unsigned int __byte_perm(unsigned int __a, unsigned int __b, @@ -507,7 +509,7 @@ __DEVICE__ float __powf(float __a, float __b) { } // Parameter must have a known integer value. -#define __prof_trigger(__a) asm __volatile__("pmevent \t%0;" ::"i"(__a)) +#define __prof_trigger(__a) __asm__ __volatile__("pmevent \t%0;" ::"i"(__a)) __DEVICE__ int __rhadd(int __a, int __b) { return __nv_rhadd(__a, __b); } __DEVICE__ unsigned int __sad(int __a, int __b, unsigned int __c) { return __nv_sad(__a, __b, __c); @@ -526,7 +528,7 @@ __DEVICE__ float __tanf(float __a) { return __nv_fast_tanf(__a); } __DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); } __DEVICE__ void __threadfence_block(void) { __nvvm_membar_cta(); }; __DEVICE__ void __threadfence_system(void) { __nvvm_membar_sys(); }; -__DEVICE__ void __trap(void) { asm volatile("trap;"); } +__DEVICE__ void __trap(void) { __asm__ __volatile__("trap;"); } __DEVICE__ unsigned int __uAtomicAdd(unsigned int *__p, unsigned int __v) { return __nvvm_atom_add_gen_i((int *)__p, __v); } @@ -1051,122 +1053,136 @@ __DEVICE__ unsigned int __bool2mask(unsigned int __a, int shift) { } __DEVICE__ unsigned int __vabs2(unsigned int __a) { unsigned int r; - asm("vabs diff 2.s32.s32.s32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(0), "r"(0)); + __asm__("vabs diff 2.s32.s32.s32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(0), "r"(0)); return r; } __DEVICE__ unsigned int __vabs4(unsigned int __a) { unsigned int r; - asm("vabs diff 4.s32.s32.s32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(0), "r"(0)); + __asm__("vabs diff 4.s32.s32.s32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(0), "r"(0)); return r; } __DEVICE__ unsigned int __vabs diff s2(unsigned int __a, unsigned int __b) { unsigned int r; - asm("vabs diff 2.s32.s32.s32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(__b), "r"(0)); + __asm__("vabs diff 2.s32.s32.s32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(__b), "r"(0)); return r; } __DEVICE__ unsigned int __vabs diff s4(unsigned int __a, unsigned int __b) { unsigned int r; - asm("vabs diff 4.s32.s32.s32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(__b), "r"(0)); + __asm__("vabs diff 4.s32.s32.s32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(__b), "r"(0)); return r; } __DEVICE__ unsigned int __vabs diff u2(unsigned int __a, unsigned int __b) { unsigned int r; - asm("vabs diff 2.u32.u32.u32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(__b), "r"(0)); + __asm__("vabs diff 2.u32.u32.u32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(__b), "r"(0)); return r; } __DEVICE__ unsigned int __vabs diff u4(unsigned int __a, unsigned int __b) { unsigned int r; - asm("vabs diff 4.u32.u32.u32 %0,%1,%2,%3;" - : "=r"(r) - : "r"(__a), "r"(__b), "r"(0)); + __asm__("vabs diff 4.u32.u32.u32 %0,%1,%2,%3;" + : "=r"(r) + : "r"(__a), "r"(__b), "r"(0)); return r; } __DEVICE__ unsigned int __vabsss2(unsigned int __a) { unsigned i
[clang] b611354 - [openmp] Annotate tmp variables with omp_thread_mem_alloc
Author: Jon Chesterfield Date: 2021-08-12T17:30:22+01:00 New Revision: b6113548c9217fb8a6d0e9ac5bef5584c1aa614d URL: https://github.com/llvm/llvm-project/commit/b6113548c9217fb8a6d0e9ac5bef5584c1aa614d DIFF: https://github.com/llvm/llvm-project/commit/b6113548c9217fb8a6d0e9ac5bef5584c1aa614d.diff LOG: [openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971 Added: Modified: clang/lib/Headers/__clang_hip_math.h Removed: diff --git a/clang/lib/Headers/__clang_hip_math.h b/clang/lib/Headers/__clang_hip_math.h index 9effaa18d3e8..ef7e087b832c 100644 --- a/clang/lib/Headers/__clang_hip_math.h +++ b/clang/lib/Headers/__clang_hip_math.h @@ -19,6 +19,9 @@ #endif #include #include +#ifdef __OPENMP_AMDGCN__ +#include +#endif #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") @@ -258,6 +261,9 @@ float fmodf(float __x, float __y) { return __ocml_fmod_f32(__x, __y); } __DEVICE__ float frexpf(float __x, int *__nptr) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -343,6 +349,9 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); } __DEVICE__ float modff(float __x, float *__iptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__iptr = __tmp; @@ -423,6 +432,9 @@ float remainderf(float __x, float __y) { __DEVICE__ float remquof(float __x, float __y, int *__quo) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_remquo_f32( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -479,6 +491,9 @@ __RETURN_TYPE __signbitf(float __x) { return __ocml_signbit_f32(__x); } __DEVICE__ void sincosf(float __x, float *__sinptr, float *__cosptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -487,6 +502,9 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) { __DEVICE__ void sincospif(float __x, float *__sinptr, float *__cosptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincospi_f32( __x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -799,6 +817,9 @@ double fmod(double __x, double __y) { return __ocml_fmod_f64(__x, __y); } __DEVICE__ double frexp(double __x, int *__nptr) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -883,6 +904,9 @@ long int lround(double __x) { return __ocml_round_f64(__x); } __DEVICE__ double modf(double __x, double *__iptr) { double __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp); *__iptr = __tmp; @@ -971,6 +995,9 @@ double remainder(double __x, double __y) { __DEVICE__ double remquo(double __x, double __y, int *__quo) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_remquo_f64( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -1029,6 +1056,9 @@ double sin(double __x) { return __ocml_sin_f64(__x); } __DEVICE__ void sincos(double __x, double *__sinptr, double *__cosptr) { double __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincos_f64( __x, (__attribute__((address_space(5))) double *)&__tmp); *__cosptr = __tmp; @@ -1037,6 +1067,9 @@ void sincos(double __x,
[clang] 6a8e512 - Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc"
Author: Jon Chesterfield Date: 2021-08-12T17:44:36+01:00 New Revision: 6a8e5120abacdfe0f05c9670782e59e2b729a318 URL: https://github.com/llvm/llvm-project/commit/6a8e5120abacdfe0f05c9670782e59e2b729a318 DIFF: https://github.com/llvm/llvm-project/commit/6a8e5120abacdfe0f05c9670782e59e2b729a318.diff LOG: Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc" This reverts commit b6113548c9217fb8a6d0e9ac5bef5584c1aa614d. Added: Modified: clang/lib/Headers/__clang_hip_math.h Removed: diff --git a/clang/lib/Headers/__clang_hip_math.h b/clang/lib/Headers/__clang_hip_math.h index ef7e087b832c..9effaa18d3e8 100644 --- a/clang/lib/Headers/__clang_hip_math.h +++ b/clang/lib/Headers/__clang_hip_math.h @@ -19,9 +19,6 @@ #endif #include #include -#ifdef __OPENMP_AMDGCN__ -#include -#endif #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") @@ -261,9 +258,6 @@ float fmodf(float __x, float __y) { return __ocml_fmod_f32(__x, __y); } __DEVICE__ float frexpf(float __x, int *__nptr) { int __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif float __r = __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -349,9 +343,6 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); } __DEVICE__ float modff(float __x, float *__iptr) { float __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif float __r = __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__iptr = __tmp; @@ -432,9 +423,6 @@ float remainderf(float __x, float __y) { __DEVICE__ float remquof(float __x, float __y, int *__quo) { int __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif float __r = __ocml_remquo_f32( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -491,9 +479,6 @@ __RETURN_TYPE __signbitf(float __x) { return __ocml_signbit_f32(__x); } __DEVICE__ void sincosf(float __x, float *__sinptr, float *__cosptr) { float __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif *__sinptr = __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -502,9 +487,6 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) { __DEVICE__ void sincospif(float __x, float *__sinptr, float *__cosptr) { float __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif *__sinptr = __ocml_sincospi_f32( __x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -817,9 +799,6 @@ double fmod(double __x, double __y) { return __ocml_fmod_f64(__x, __y); } __DEVICE__ double frexp(double __x, int *__nptr) { int __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif double __r = __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -904,9 +883,6 @@ long int lround(double __x) { return __ocml_round_f64(__x); } __DEVICE__ double modf(double __x, double *__iptr) { double __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif double __r = __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp); *__iptr = __tmp; @@ -995,9 +971,6 @@ double remainder(double __x, double __y) { __DEVICE__ double remquo(double __x, double __y, int *__quo) { int __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif double __r = __ocml_remquo_f64( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -1056,9 +1029,6 @@ double sin(double __x) { return __ocml_sin_f64(__x); } __DEVICE__ void sincos(double __x, double *__sinptr, double *__cosptr) { double __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif *__sinptr = __ocml_sincos_f64( __x, (__attribute__((address_space(5))) double *)&__tmp); *__cosptr = __tmp; @@ -1067,9 +1037,6 @@ void sincos(double __x, double *__sinptr, double *__cosptr) { __DEVICE__ void sincospi(double __x, double *__sinptr, double *__cosptr) { double __tmp; -#ifdef __OPENMP_AMDGCN__ -#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) -#endif *__sinptr = __ocml_sincospi_f64( __x, (__attribute__((address_space(5))) double *)&__tmp); *__cosptr = __tmp; ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 21d91a8 - [libomptarget][devicertl] Replace lanemask with uint64 at interface
Author: Jon Chesterfield Date: 2021-08-18T20:47:33+01:00 New Revision: 21d91a8ef319eec9c2c272e19beee726429524aa URL: https://github.com/llvm/llvm-project/commit/21d91a8ef319eec9c2c272e19beee726429524aa DIFF: https://github.com/llvm/llvm-project/commit/21d91a8ef319eec9c2c272e19beee726429524aa.diff LOG: [libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317 Added: Modified: clang/test/OpenMP/nvptx_parallel_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp llvm/test/Transforms/OpenMP/add_attributes.ll openmp/libomptarget/DeviceRTL/include/Interface.h openmp/libomptarget/DeviceRTL/src/Synchronization.cpp openmp/libomptarget/deviceRTLs/common/src/sync.cu openmp/libomptarget/deviceRTLs/interface.h Removed: diff --git a/clang/test/OpenMP/nvptx_parallel_codegen.cpp b/clang/test/OpenMP/nvptx_parallel_codegen.cpp index 7cb86b80e158f..712c5a41c573d 100644 --- a/clang/test/OpenMP/nvptx_parallel_codegen.cpp +++ b/clang/test/OpenMP/nvptx_parallel_codegen.cpp @@ -485,7 +485,7 @@ int bar(int n){ // CHECK3-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], align 4 // CHECK3-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4 // CHECK3-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4 -// CHECK3-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask() +// CHECK3-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask() // CHECK3-NEXT:[[NVPTX_TID:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() // CHECK3-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() // CHECK3-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4 @@ -508,7 +508,7 @@ int bar(int n){ // CHECK3-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var") // CHECK3-NEXT:br label [[OMP_CRITICAL_SYNC]] // CHECK3: omp.critical.sync: -// CHECK3-NEXT:call void @__kmpc_syncwarp(i32 [[TMP1]]) +// CHECK3-NEXT:call void @__kmpc_syncwarp(i64 [[TMP1]]) // CHECK3-NEXT:[[TMP9:%.*]] = add nsw i32 [[TMP4]], 1 // CHECK3-NEXT:store i32 [[TMP9]], i32* [[CRITICAL_COUNTER]], align 4 // CHECK3-NEXT:br label [[OMP_CRITICAL_LOOP]] @@ -938,7 +938,7 @@ int bar(int n){ // CHECK4-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], align 4 // CHECK4-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4 // CHECK4-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4 -// CHECK4-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask() +// CHECK4-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask() // CHECK4-NEXT:[[NVPTX_TID:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() // CHECK4-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() // CHECK4-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4 @@ -961,7 +961,7 @@ int bar(int n){ // CHECK4-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var") // CHECK4-NEXT:br label [[OMP_CRITICAL_SYNC]] // CHECK4: omp.critical.sync: -// CHECK4-NEXT:call void @__kmpc_syncwarp(i32 [[TMP1]]) +// CHECK4-NEXT:call void @__kmpc_syncwarp(i64 [[TMP1]]) // CHECK4-NEXT:[[TMP9:%.*]] = add nsw i32 [[TMP4]], 1 // CHECK4-NEXT:store i32 [[TMP9]], i32* [[CRITICAL_COUNTER]], align 4 // CHECK4-NEXT:br label [[OMP_CRITICAL_LOOP]] @@ -1391,7 +1391,7 @@ int bar(int n){ // CHECK5-NEXT:store i32* [[DOTBOUND_TID_]], i32** [[DOTBOUND_TID__ADDR]], align 4 // CHECK5-NEXT:store i32* [[A]], i32** [[A_ADDR]], align 4 // CHECK5-NEXT:[[TMP0:%.*]] = load i32*, i32** [[A_ADDR]], align 4 -// CHECK5-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_warp_active_thread_mask() +// CHECK5-NEXT:[[TMP1:%.*]] = call i64 @__kmpc_warp_active_thread_mask() // CHECK5-NEXT:[[NVPTX_TID:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() // CHECK5-NEXT:[[NVPTX_NUM_THREADS:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() // CHECK5-NEXT:store i32 0, i32* [[CRITICAL_COUNTER]], align 4 @@ -1414,7 +1414,7 @@ int bar(int n){ // CHECK5-NEXT:call void @__kmpc_end_critical(%struct.ident_t* @[[GLOB1]], i32 [[TMP7]], [8 x i32]* @"_gomp_critical_user_$var") // CHECK5-NEXT:br label [[OMP_CRITICAL_SYNC]] //
[clang] dbd7bad - [openmp] Annotate tmp variables with omp_thread_mem_alloc
Author: Jon Chesterfield Date: 2021-08-19T02:22:11+01:00 New Revision: dbd7bad9ad9bc32538e324417c23387bf4ac7747 URL: https://github.com/llvm/llvm-project/commit/dbd7bad9ad9bc32538e324417c23387bf4ac7747 DIFF: https://github.com/llvm/llvm-project/commit/dbd7bad9ad9bc32538e324417c23387bf4ac7747.diff LOG: [openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971 Added: clang/test/Headers/Inputs/include/omp.h Modified: clang/lib/Headers/__clang_hip_math.h Removed: diff --git a/clang/lib/Headers/__clang_hip_math.h b/clang/lib/Headers/__clang_hip_math.h index 9effaa18d3e8c..ef7e087b832ca 100644 --- a/clang/lib/Headers/__clang_hip_math.h +++ b/clang/lib/Headers/__clang_hip_math.h @@ -19,6 +19,9 @@ #endif #include #include +#ifdef __OPENMP_AMDGCN__ +#include +#endif #endif // !defined(__HIPCC_RTC__) #pragma push_macro("__DEVICE__") @@ -258,6 +261,9 @@ float fmodf(float __x, float __y) { return __ocml_fmod_f32(__x, __y); } __DEVICE__ float frexpf(float __x, int *__nptr) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_frexp_f32(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -343,6 +349,9 @@ long int lroundf(float __x) { return __ocml_round_f32(__x); } __DEVICE__ float modff(float __x, float *__iptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__iptr = __tmp; @@ -423,6 +432,9 @@ float remainderf(float __x, float __y) { __DEVICE__ float remquof(float __x, float __y, int *__quo) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif float __r = __ocml_remquo_f32( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -479,6 +491,9 @@ __RETURN_TYPE __signbitf(float __x) { return __ocml_signbit_f32(__x); } __DEVICE__ void sincosf(float __x, float *__sinptr, float *__cosptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -487,6 +502,9 @@ void sincosf(float __x, float *__sinptr, float *__cosptr) { __DEVICE__ void sincospif(float __x, float *__sinptr, float *__cosptr) { float __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincospi_f32( __x, (__attribute__((address_space(5))) float *)&__tmp); *__cosptr = __tmp; @@ -799,6 +817,9 @@ double fmod(double __x, double __y) { return __ocml_fmod_f64(__x, __y); } __DEVICE__ double frexp(double __x, int *__nptr) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_frexp_f64(__x, (__attribute__((address_space(5))) int *)&__tmp); *__nptr = __tmp; @@ -883,6 +904,9 @@ long int lround(double __x) { return __ocml_round_f64(__x); } __DEVICE__ double modf(double __x, double *__iptr) { double __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp); *__iptr = __tmp; @@ -971,6 +995,9 @@ double remainder(double __x, double __y) { __DEVICE__ double remquo(double __x, double __y, int *__quo) { int __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif double __r = __ocml_remquo_f64( __x, __y, (__attribute__((address_space(5))) int *)&__tmp); *__quo = __tmp; @@ -1029,6 +1056,9 @@ double sin(double __x) { return __ocml_sin_f64(__x); } __DEVICE__ void sincos(double __x, double *__sinptr, double *__cosptr) { double __tmp; +#ifdef __OPENMP_AMDGCN__ +#pragma omp allocate(__tmp) allocator(omp_thread_mem_alloc) +#endif *__sinptr = __ocml_sincos_f64( __x, (__attribute__((address_space(5))) double *)&__tmp); *__cosptr = __tmp; @@ -
[clang] 77579b9 - [openmp][nfc] Replace OMPGridValues array with struct
Author: Jon Chesterfield Date: 2021-08-19T13:25:42+01:00 New Revision: 77579b99e9ce1638ca696fa7c3872ae8668d997d URL: https://github.com/llvm/llvm-project/commit/77579b99e9ce1638ca696fa7c3872ae8668d997d DIFF: https://github.com/llvm/llvm-project/commit/77579b99e9ce1638ca696fa7c3872ae8668d997d.diff LOG: [openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339 Added: Modified: clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/AMDGPU.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/lib/CodeGen/CGOpenMPRuntimeGPU.h llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h openmp/libomptarget/plugins/amdgpu/src/rtl.cpp Removed: diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h index 21289b0dfd04..ab855948b447 100644 --- a/clang/include/clang/Basic/TargetInfo.h +++ b/clang/include/clang/Basic/TargetInfo.h @@ -210,8 +210,8 @@ class TargetInfo : public virtual TransferrableTargetInfo, unsigned char RegParmMax, SSERegParmMax; TargetCXXABI TheCXXABI; const LangASMap *AddrSpaceMap; - const unsigned *GridValues = - nullptr; // Array of target-specific GPU grid values that must be + const llvm::omp::GV *GridValues = + nullptr; // target-specific GPU grid values that must be // consistent between host RTL (plugin), device RTL, and clang. mutable StringRef PlatformName; @@ -1410,10 +1410,10 @@ class TargetInfo : public virtual TransferrableTargetInfo, return LangAS::Default; } - /// Return a target-specific GPU grid value based on the GVIDX enum \p gv - unsigned getGridValue(llvm::omp::GVIDX gv) const { + /// Return a target-specific GPU grid values + const llvm::omp::GV &getGridValue() const { assert(GridValues != nullptr && "GridValues not initialized"); -return GridValues[gv]; +return *GridValues; } /// Retrieve the name of the platform as it is used in the diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index fac786dbcf9e..cebb19e7ccab 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -335,7 +335,7 @@ AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple, llvm::AMDGPU::getArchAttrR600(GPUKind)) { resetDataLayout(isAMDGCN(getTriple()) ? DataLayoutStringAMDGCN : DataLayoutStringR600); - GridValues = llvm::omp::AMDGPUGpuGridValues; + GridValues = &llvm::omp::AMDGPUGridValues; setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D || !isAMDGCN(Triple)); diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index 56f8a179db3c..d1a34e4a81c5 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -65,7 +65,7 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple, TLSSupported = false; VLASupported = false; AddrSpaceMap = &NVPTXAddrSpaceMap; - GridValues = llvm::omp::NVPTXGpuGridValues; + GridValues = &llvm::omp::NVPTXGridValues; UseAddrSpaceMapMangling = true; // Define available target features diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp index 33d4ab838af1..cac5faaa8d0f 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp @@ -20,6 +20,7 @@ #include "clang/AST/StmtVisitor.h" #include "clang/Basic/Cuda.h" #include "llvm/ADT/SmallPtrSet.h" +#include "llvm/Frontend/OpenMP/OMPGridValues.h" #include "llvm/IR/IntrinsicsAMDGPU.h" using namespace clang; @@ -35,7 +36,7 @@ CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM) llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) { CGBuilderTy &Bld = CGF.Builder; // return constant compile-time target-specific warp size - unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size); + unsigned WarpSize = CGF.getTarget().getGridValue().GV_Warp_Size; return Bld.getInt32(WarpSize); } diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 63fecedc6fb7..b13d55994ef6 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
[clang] 33427fd - [libomptarget] Build DeviceRTL for amdgpu
Author: Jon Chesterfield Date: 2021-10-28T00:41:45+01:00 New Revision: 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2 URL: https://github.com/llvm/llvm-project/commit/33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2 DIFF: https://github.com/llvm/llvm-project/commit/33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2.diff LOG: [libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227 Added: Modified: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp openmp/libomptarget/DeviceRTL/CMakeLists.txt openmp/libomptarget/DeviceRTL/src/Configuration.cpp openmp/libomptarget/DeviceRTL/src/Synchronization.cpp openmp/libomptarget/plugins/amdgpu/CMakeLists.txt openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/delete_inf_refcount.c openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug49334.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/global_constructor.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index 5400e2617729..b138000f8cf2 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( std::string BitcodeSuffix; if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime, options::OPT_fno_openmp_target_new_runtime, false)) -BitcodeSuffix = "new-amdgcn-" + GPUArch; +BitcodeSuffix = "new-amdgpu-" + GPUArch; else BitcodeSuffix = "amdgcn-" + GPUArch; diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index a4f9862fb09b..419c64d38116 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -226,6 +226,5 @@ foreach(sm ${nvptx_sm_list}) endforeach() foreach(mcpu ${amdgpu_mcpus}) - # require D112227 or similar to enable the compilation for amdgpu - # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) + compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) endforeach() diff --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp index 2b6f20fb1732..f7c61dc013cf 100644 --- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp @@ -20,9 +20,9 @@ using namespace _OMP; #pragma omp declare target -extern uint32_t __omp_rtl_debug_kind; +extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU -// TOOD: We want to change the name as soon as the old runtime is gone. +// TODO: We want to change the name as soon as the old runtime is gone. DeviceEnvironmentTy CONSTANT(omptarget_device_environment) __attribute__((used)); diff --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp index c77e766ae6ca..33e2194b25f3 100644 --- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp @@ -68,8 +68,23 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int Ordering) { ///{ #pragma omp begin declare variant match(device = {arch(amdgcn)}) -uint32_t atomicInc(uint32_t *Address, uint32_t Val, int Ordering) { - return __builtin_amdgcn_atomic_inc32(Address, Val, Ordering, ""); +uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) { + // builtin_amdgcn_atomic_inc32 should expand to this switch when + // passed a runtime value, but does not do so yet. Workaround here. + switch (Ordering) { + default: +__builtin_unreachable(); + case __ATOMIC_RELAXED: +return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, ""); + case __ATOMIC_ACQUIRE: +return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQUIRE, ""); + case __AT
[clang] 6c7b203 - Revert "[libomptarget] Build DeviceRTL for amdgpu"
Author: Jon Chesterfield Date: 2021-10-28T01:01:53+01:00 New Revision: 6c7b203d1d7000269215ab5b3d329ab03dc85e42 URL: https://github.com/llvm/llvm-project/commit/6c7b203d1d7000269215ab5b3d329ab03dc85e42 DIFF: https://github.com/llvm/llvm-project/commit/6c7b203d1d7000269215ab5b3d329ab03dc85e42.diff LOG: Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit 33427fdb7b52b79ce5e25b7e14e0f1a44d876bd2. Added: Modified: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp openmp/libomptarget/DeviceRTL/CMakeLists.txt openmp/libomptarget/DeviceRTL/src/Configuration.cpp openmp/libomptarget/DeviceRTL/src/Synchronization.cpp openmp/libomptarget/plugins/amdgpu/CMakeLists.txt openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/delete_inf_refcount.c openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug49334.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/global_constructor.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b138000f8cf2..5400e2617729 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( std::string BitcodeSuffix; if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime, options::OPT_fno_openmp_target_new_runtime, false)) -BitcodeSuffix = "new-amdgpu-" + GPUArch; +BitcodeSuffix = "new-amdgcn-" + GPUArch; else BitcodeSuffix = "amdgcn-" + GPUArch; diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index 419c64d38116..a4f9862fb09b 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -226,5 +226,6 @@ foreach(sm ${nvptx_sm_list}) endforeach() foreach(mcpu ${amdgpu_mcpus}) - compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) + # require D112227 or similar to enable the compilation for amdgpu + # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) endforeach() diff --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp index f7c61dc013cf..2b6f20fb1732 100644 --- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp @@ -20,9 +20,9 @@ using namespace _OMP; #pragma omp declare target -extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU +extern uint32_t __omp_rtl_debug_kind; -// TODO: We want to change the name as soon as the old runtime is gone. +// TOOD: We want to change the name as soon as the old runtime is gone. DeviceEnvironmentTy CONSTANT(omptarget_device_environment) __attribute__((used)); diff --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp index 931dffcaa131..46e7701a4872 100644 --- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp @@ -68,23 +68,8 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int Ordering) { ///{ #pragma omp begin declare variant match(device = {arch(amdgcn)}) -uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) { - // builtin_amdgcn_atomic_inc32 should expand to this switch when - // passed a runtime value, but does not do so yet. Workaround here. - switch (Ordering) { - default: -__builtin_unreachable(); - case __ATOMIC_RELAXED: -return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, ""); - case __ATOMIC_ACQUIRE: -return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQUIRE, ""); - case __ATOMIC_RELEASE: -return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELEASE, ""); - case __ATOMIC_ACQ_REL: -return __builtin_amdgcn_atomic_inc32(A,
[clang] 4d50803 - [libomptarget] Build DeviceRTL for amdgpu
Author: Jon Chesterfield Date: 2021-10-28T12:34:01+01:00 New Revision: 4d50803ce49ce6b57c4865361c9ba0ad7063b7be URL: https://github.com/llvm/llvm-project/commit/4d50803ce49ce6b57c4865361c9ba0ad7063b7be DIFF: https://github.com/llvm/llvm-project/commit/4d50803ce49ce6b57c4865361c9ba0ad7063b7be.diff LOG: [libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227 Added: Modified: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp openmp/libomptarget/DeviceRTL/CMakeLists.txt openmp/libomptarget/DeviceRTL/src/Configuration.cpp openmp/libomptarget/DeviceRTL/src/Synchronization.cpp openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/delete_inf_refcount.c openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug49334.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/global_constructor.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index 5400e26177291..b138000f8cf29 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -252,7 +252,7 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( std::string BitcodeSuffix; if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime, options::OPT_fno_openmp_target_new_runtime, false)) -BitcodeSuffix = "new-amdgcn-" + GPUArch; +BitcodeSuffix = "new-amdgpu-" + GPUArch; else BitcodeSuffix = "amdgcn-" + GPUArch; diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt index a4f9862fb09b3..419c64d381168 100644 --- a/openmp/libomptarget/DeviceRTL/CMakeLists.txt +++ b/openmp/libomptarget/DeviceRTL/CMakeLists.txt @@ -226,6 +226,5 @@ foreach(sm ${nvptx_sm_list}) endforeach() foreach(mcpu ${amdgpu_mcpus}) - # require D112227 or similar to enable the compilation for amdgpu - # compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) + compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib) endforeach() diff --git a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp index 2b6f20fb1732c..f7c61dc013cf1 100644 --- a/openmp/libomptarget/DeviceRTL/src/Configuration.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Configuration.cpp @@ -20,9 +20,9 @@ using namespace _OMP; #pragma omp declare target -extern uint32_t __omp_rtl_debug_kind; +extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU -// TOOD: We want to change the name as soon as the old runtime is gone. +// TODO: We want to change the name as soon as the old runtime is gone. DeviceEnvironmentTy CONSTANT(omptarget_device_environment) __attribute__((used)); diff --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp index d09461a016200..931dffcaa131e 100644 --- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp +++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp @@ -68,8 +68,23 @@ uint64_t atomicAdd(uint64_t *Address, uint64_t Val, int Ordering) { ///{ #pragma omp begin declare variant match(device = {arch(amdgcn)}) -uint32_t atomicInc(uint32_t *Address, uint32_t Val, int Ordering) { - return __builtin_amdgcn_atomic_inc32(Address, Val, Ordering, ""); +uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) { + // builtin_amdgcn_atomic_inc32 should expand to this switch when + // passed a runtime value, but does not do so yet. Workaround here. + switch (Ordering) { + default: +__builtin_unreachable(); + case __ATOMIC_RELAXED: +return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, ""); + case __ATOMI
[clang] 2c37ae6 - [nfc] Refactor CGGPUBuiltin to help review D112680
Author: Jon Chesterfield Date: 2021-11-08T15:00:08Z New Revision: 2c37ae6d14cf263724720f56fc34b4579a6e5c1c URL: https://github.com/llvm/llvm-project/commit/2c37ae6d14cf263724720f56fc34b4579a6e5c1c DIFF: https://github.com/llvm/llvm-project/commit/2c37ae6d14cf263724720f56fc34b4579a6e5c1c.diff LOG: [nfc] Refactor CGGPUBuiltin to help review D112680 Added: Modified: clang/lib/CodeGen/CGGPUBuiltin.cpp Removed: diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp index afbebd070c05..43192c587e26 100644 --- a/clang/lib/CodeGen/CGGPUBuiltin.cpp +++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp @@ -66,39 +66,22 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { // // Note that by the time this function runs, E's args have already undergone the // standard C vararg promotion (short -> int, float -> double, etc.). -RValue -CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E, - ReturnValueSlot ReturnValue) { - assert(getTarget().getTriple().isNVPTX()); - assert(E->getBuiltinCallee() == Builtin::BIprintf); - assert(E->getNumArgs() >= 1); // printf always has at least one arg. - - const llvm::DataLayout &DL = CGM.getDataLayout(); - llvm::LLVMContext &Ctx = CGM.getLLVMContext(); - - CallArgList Args; - EmitCallArgs(Args, - E->getDirectCallee()->getType()->getAs(), - E->arguments(), E->getDirectCallee(), - /* ParamsToSkip = */ 0); - // We don't know how to emit non-scalar varargs. - if (llvm::any_of(llvm::drop_begin(Args), [&](const CallArg &A) { -return !A.getRValue(*this).isScalar(); - })) { -CGM.ErrorUnsupported(E, "non-scalar arg to printf"); -return RValue::get(llvm::ConstantInt::get(IntTy, 0)); - } +namespace { +llvm::Value *packArgsIntoNVPTXFormatBuffer(CodeGenFunction *CGF, + const CallArgList &Args) { + const llvm::DataLayout &DL = CGF->CGM.getDataLayout(); + llvm::LLVMContext &Ctx = CGF->CGM.getLLVMContext(); + CGBuilderTy &Builder = CGF->Builder; // Construct and fill the args buffer that we'll pass to vprintf. - llvm::Value *BufferPtr; if (Args.size() <= 1) { // If there are no args, pass a null pointer to vprintf. -BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx)); +return llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx)); } else { llvm::SmallVector ArgTypes; for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) - ArgTypes.push_back(Args[I].getRValue(*this).getScalarVal()->getType()); + ArgTypes.push_back(Args[I].getRValue(*CGF).getScalarVal()->getType()); // Using llvm::StructType is correct only because printf doesn't accept // aggregates. If we had to handle aggregates here, we'd have to manually @@ -106,15 +89,40 @@ CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E, // that the alignment of the llvm type was the same as the alignment of the // clang type. llvm::Type *AllocaTy = llvm::StructType::create(ArgTypes, "printf_args"); -llvm::Value *Alloca = CreateTempAlloca(AllocaTy); +llvm::Value *Alloca = CGF->CreateTempAlloca(AllocaTy); for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) { llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1); - llvm::Value *Arg = Args[I].getRValue(*this).getScalarVal(); + llvm::Value *Arg = Args[I].getRValue(*CGF).getScalarVal(); Builder.CreateAlignedStore(Arg, P, DL.getPrefTypeAlign(Arg->getType())); } -BufferPtr = Builder.CreatePointerCast(Alloca, llvm::Type::getInt8PtrTy(Ctx)); +return Builder.CreatePointerCast(Alloca, llvm::Type::getInt8PtrTy(Ctx)); } +} +} // namespace + +RValue +CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E, + ReturnValueSlot ReturnValue) { + assert(getTarget().getTriple().isNVPTX()); + assert(E->getBuiltinCallee() == Builtin::BIprintf); + assert(E->getNumArgs() >= 1); // printf always has at least one arg. + + CallArgList Args; + EmitCallArgs(Args, + E->getDirectCallee()->getType()->getAs(), + E->arguments(), E->getDirectCallee(), + /* ParamsToSkip = */ 0); + + // We don't know how to emit non-scalar varargs. + if (llvm::any_of(llvm::drop_begin(Args), [&](const CallArg &A) { +return !A.getRValue(*this).isScalar(); + })) { +CGM.ErrorUnsupported(E, "non-scalar arg to printf"); +return RValue::get(llvm::ConstantInt::get(IntTy, 0)); + } + + llvm::Value *BufferPtr = packArgsIntoNVPTXFormatBuffer(this, Args); // Invoke vprintf and return. llvm::Function* VprintfFunc = GetVprintfDeclaration(CGM.getModule()); ___ cfe-commits mai
[clang] db81d8f - [OpenMP] Lower printf to __llvm_omp_vprintf
Author: Jon Chesterfield Date: 2021-11-08T18:38:00Z New Revision: db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7 URL: https://github.com/llvm/llvm-project/commit/db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7 DIFF: https://github.com/llvm/llvm-project/commit/db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7.diff LOG: [OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char*, void* arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. The exact set of changes to check-openmp probably needs revision before commit Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680 Added: Modified: clang/lib/CodeGen/CGBuiltin.cpp clang/lib/CodeGen/CGGPUBuiltin.cpp clang/lib/CodeGen/CodeGenFunction.h openmp/libomptarget/DeviceRTL/include/Debug.h openmp/libomptarget/DeviceRTL/src/Debug.cpp openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index fab21e5b588a5..18e429cf3efd2 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -5106,11 +5106,16 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy())); } case Builtin::BIprintf: -if (getTarget().getTriple().isNVPTX()) - return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue); -if (getTarget().getTriple().getArch() == Triple::amdgcn && -getLangOpts().HIP) - return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue); +if (getTarget().getTriple().isNVPTX() || +getTarget().getTriple().isAMDGCN()) { + if (getLangOpts().OpenMPIsDevice) +return EmitOpenMPDevicePrintfCallExpr(E); + if (getTarget().getTriple().isNVPTX()) +return EmitNVPTXDevicePrintfCallExpr(E); + if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP) +return EmitAMDGPUDevicePrintfCallExpr(E); +} + break; case Builtin::BI__builtin_canonicalize: case Builtin::BI__builtin_canonicalizef: diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp index 43192c587e262..fdd2fa18bb4a0 100644 --- a/clang/lib/CodeGen/CGGPUBuiltin.cpp +++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp @@ -21,13 +21,14 @@ using namespace clang; using namespace CodeGen; -static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { +namespace { +llvm::Function *GetVprintfDeclaration(llvm::Module &M) { llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), llvm::Type::getInt8PtrTy(M.getContext())}; llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false); - if (auto* F = M.getFunction("vprintf")) { + if (auto *F = M.getFunction("vprintf")) { // Our CUDA system header declares vprintf with the right signature, so // nobody else should have been able to declare vprintf with a bogus // signature. @@ -41,6 +42,28 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M); } +llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) { + const char *Name = "__llvm_omp_vprintf"; + llvm::Module &M = CGM.getModule(); + llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), +llvm::Type::getInt8PtrTy(M.getContext()), +llvm::Type::getInt32Ty(M.getContext())}; + llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( + llvm::Type::getIn
[clang] 0fa45d6 - Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
Author: Jon Chesterfield Date: 2021-11-08T20:28:57Z New Revision: 0fa45d6d8067d71a8dccac7d942c53b5fd80e499 URL: https://github.com/llvm/llvm-project/commit/0fa45d6d8067d71a8dccac7d942c53b5fd80e499 DIFF: https://github.com/llvm/llvm-project/commit/0fa45d6d8067d71a8dccac7d942c53b5fd80e499.diff LOG: Revert "[OpenMP] Lower printf to __llvm_omp_vprintf" This reverts commit db81d8f6c4d6c4f8dfaa036d6959528c9f14e7d7. Added: Modified: clang/lib/CodeGen/CGBuiltin.cpp clang/lib/CodeGen/CGGPUBuiltin.cpp clang/lib/CodeGen/CodeGenFunction.h openmp/libomptarget/DeviceRTL/include/Debug.h openmp/libomptarget/DeviceRTL/src/Debug.cpp openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 18e429cf3efd2..fab21e5b588a5 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -5106,16 +5106,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy())); } case Builtin::BIprintf: -if (getTarget().getTriple().isNVPTX() || -getTarget().getTriple().isAMDGCN()) { - if (getLangOpts().OpenMPIsDevice) -return EmitOpenMPDevicePrintfCallExpr(E); - if (getTarget().getTriple().isNVPTX()) -return EmitNVPTXDevicePrintfCallExpr(E); - if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP) -return EmitAMDGPUDevicePrintfCallExpr(E); -} - +if (getTarget().getTriple().isNVPTX()) + return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue); +if (getTarget().getTriple().getArch() == Triple::amdgcn && +getLangOpts().HIP) + return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue); break; case Builtin::BI__builtin_canonicalize: case Builtin::BI__builtin_canonicalizef: diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp index fdd2fa18bb4a0..43192c587e262 100644 --- a/clang/lib/CodeGen/CGGPUBuiltin.cpp +++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp @@ -21,14 +21,13 @@ using namespace clang; using namespace CodeGen; -namespace { -llvm::Function *GetVprintfDeclaration(llvm::Module &M) { +static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), llvm::Type::getInt8PtrTy(M.getContext())}; llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false); - if (auto *F = M.getFunction("vprintf")) { + if (auto* F = M.getFunction("vprintf")) { // Our CUDA system header declares vprintf with the right signature, so // nobody else should have been able to declare vprintf with a bogus // signature. @@ -42,28 +41,6 @@ llvm::Function *GetVprintfDeclaration(llvm::Module &M) { VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M); } -llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) { - const char *Name = "__llvm_omp_vprintf"; - llvm::Module &M = CGM.getModule(); - llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), -llvm::Type::getInt8PtrTy(M.getContext()), -llvm::Type::getInt32Ty(M.getContext())}; - llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( - llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false); - - if (auto *F = M.getFunction(Name)) { -if (F->getFunctionType() != VprintfFuncType) { - CGM.Error(SourceLocation(), -"Invalid type declaration for __llvm_omp_vprintf"); - return nullptr; -} -return F; - } - - return llvm::Function::Create( - VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, Name, &M); -} - // Transforms a call to printf into a call to the NVPTX vprintf syscall (which // isn't particularly
[clang] 27177b8 - [OpenMP] Lower printf to __llvm_omp_vprintf
Author: Jon Chesterfield Date: 2021-11-10T15:30:56Z New Revision: 27177b82d4ca4451f288168fc1e06c0736afbdaf URL: https://github.com/llvm/llvm-project/commit/27177b82d4ca4451f288168fc1e06c0736afbdaf DIFF: https://github.com/llvm/llvm-project/commit/27177b82d4ca4451f288168fc1e06c0736afbdaf.diff LOG: [OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char*, void* arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680 Added: Modified: clang/lib/CodeGen/CGBuiltin.cpp clang/lib/CodeGen/CGGPUBuiltin.cpp clang/lib/CodeGen/CodeGenFunction.h clang/test/OpenMP/nvptx_target_printf_codegen.c openmp/libomptarget/DeviceRTL/include/Debug.h openmp/libomptarget/DeviceRTL/src/Debug.cpp openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu openmp/libomptarget/test/mapping/data_member_ref.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp openmp/libomptarget/test/mapping/lambda_by_value.cpp openmp/libomptarget/test/mapping/ompx_hold/struct.c openmp/libomptarget/test/mapping/ptr_and_obj_motion.c openmp/libomptarget/test/mapping/reduction_implicit_map.cpp openmp/libomptarget/test/offloading/bug49021.cpp openmp/libomptarget/test/offloading/bug50022.cpp openmp/libomptarget/test/offloading/host_as_target.c openmp/libomptarget/test/unified_shared_memory/api.c openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c openmp/libomptarget/test/unified_shared_memory/close_modifier.c openmp/libomptarget/test/unified_shared_memory/shared_update.c Removed: diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index fab21e5b588a5..18e429cf3efd2 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -5106,11 +5106,16 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy())); } case Builtin::BIprintf: -if (getTarget().getTriple().isNVPTX()) - return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue); -if (getTarget().getTriple().getArch() == Triple::amdgcn && -getLangOpts().HIP) - return EmitAMDGPUDevicePrintfCallExpr(E, ReturnValue); +if (getTarget().getTriple().isNVPTX() || +getTarget().getTriple().isAMDGCN()) { + if (getLangOpts().OpenMPIsDevice) +return EmitOpenMPDevicePrintfCallExpr(E); + if (getTarget().getTriple().isNVPTX()) +return EmitNVPTXDevicePrintfCallExpr(E); + if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP) +return EmitAMDGPUDevicePrintfCallExpr(E); +} + break; case Builtin::BI__builtin_canonicalize: case Builtin::BI__builtin_canonicalizef: diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp index 43192c587e262..fdd2fa18bb4a0 100644 --- a/clang/lib/CodeGen/CGGPUBuiltin.cpp +++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp @@ -21,13 +21,14 @@ using namespace clang; using namespace CodeGen; -static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { +namespace { +llvm::Function *GetVprintfDeclaration(llvm::Module &M) { llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), llvm::Type::getInt8PtrTy(M.getContext())}; llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( llvm::Type::getInt32Ty(M.getContext()), ArgTypes, false); - if (auto* F = M.getFunction("vprintf")) { + if (auto *F = M.getFunction("vprintf")) { // Our CUDA system header declares vprintf with the right signature, so // nobody else should have been able to declare vprintf with a bogus // signature. @@ -41,6 +42,28 @@ static llvm::Function *GetVprintfDeclaration(llvm::Module &M) { VprintfFuncType, llvm::GlobalVariable::ExternalLinkage, "vprintf", &M); } +llvm::Function *GetOpenMPVprintfDeclaration(CodeGenModule &CGM) { + const char *Name = "__llvm_omp_vprintf"; + llvm::Module &M = CGM.getModule(); + llvm::Type *ArgTypes[] = {llvm::Type::getInt8PtrTy(M.getContext()), +llvm::Type::getInt8PtrTy(M.getContext()), +llvm::Type::getInt32Ty(M.getContext())}; + llvm::FunctionType *VprintfFuncType = llvm::FunctionType::get( + llvm::Type::getInt32Ty(M.getContext()), ArgTy
[clang] 0e73832 - [openmp][amdgpu] Add comment warning that libm may be broken
Author: Jon Chesterfield Date: 2021-11-15T15:56:01Z New Revision: 0e738323a9c445e31b4e1b1dcb2beb19d6f103ef URL: https://github.com/llvm/llvm-project/commit/0e738323a9c445e31b4e1b1dcb2beb19d6f103ef DIFF: https://github.com/llvm/llvm-project/commit/0e738323a9c445e31b4e1b1dcb2beb19d6f103ef.diff LOG: [openmp][amdgpu] Add comment warning that libm may be broken Using llvm-link to add rocm device-libs probably doesn't work Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112639 Added: Modified: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp Removed: diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b138000f8cf2..863e2c597d53 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -106,6 +106,22 @@ const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand( } if (HasLibm) { + // This is not certain to work. The device libs added here, and passed to + // llvm-link, are missing attributes that they expect to be inserted when + // passed to mlink-builtin-bitcode. The amdgpu backend does not generate + // conservatively correct code when attributes are missing, so this may + // be the root cause of miscompilations. Passing via mlink-builtin-bitcode + // ultimately hits CodeGenModule::addDefaultFunctionDefinitionAttributes + // on each function, see D28538 for context. + // Potential workarounds: + // - unconditionally link all of the device libs to every translation + //unit in clang via mlink-builtin-bitcode + // - build a libm bitcode file as part of the DeviceRTL and explictly + //mlink-builtin-bitcode the rocm device libs components at build time + // - drop this llvm-link fork in favour or some calls into LLVM, chosen + //to do basically the same work as llvm-link but with that call first + // - write an opt pass that sets that on every function it sees and pipe + //the device-libs bitcode through that on the way to this llvm-link SmallVector BCLibs = AMDGPUOpenMPTC.getCommonDeviceLibNames(Args, SubArchName.str()); llvm::for_each(BCLibs, [&](StringRef BCFile) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] bfe4514 - [amdgpuarch] Delete stray hsa #include line
Author: Jon Chesterfield Date: 2023-01-25T21:40:35Z New Revision: bfe4514add5b7ab7e1f06248983a7162d734cffb URL: https://github.com/llvm/llvm-project/commit/bfe4514add5b7ab7e1f06248983a7162d734cffb DIFF: https://github.com/llvm/llvm-project/commit/bfe4514add5b7ab7e1f06248983a7162d734cffb.diff LOG: [amdgpuarch] Delete stray hsa #include line Added: Modified: clang/tools/amdgpu-arch/AMDGPUArch.cpp Removed: diff --git a/clang/tools/amdgpu-arch/AMDGPUArch.cpp b/clang/tools/amdgpu-arch/AMDGPUArch.cpp index 2fdd398c9c673..fbb084a2a1231 100644 --- a/clang/tools/amdgpu-arch/AMDGPUArch.cpp +++ b/clang/tools/amdgpu-arch/AMDGPUArch.cpp @@ -75,7 +75,6 @@ llvm::Error loadHSA() { #elif __has_include("hsa.h") #include "hsa.h" #endif -#include "hsa/hsa.h" #endif llvm::Error loadHSA() { return llvm::Error::success(); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Fix resolution of weak symbols during LTO (PR #68215)
https://github.com/JonChesterfield approved this pull request. LinkerWrapper turning into a linker is kind of inevitable and not a very happy thing. One option would be to lean on lld for amdgpu and split out the nvptx stuff in the hopes that we eventually have an alternative to nvlink, but it seems moderately unlikely to come to pass and driving both amdgpu and nvptx down the same code path has merits. https://github.com/llvm/llvm-project/pull/68215 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Prevent AMDGPU from overriding visibility on DT_nohost variables (PR #68264)
https://github.com/JonChesterfield commented: This stuff looks very cuda/opencl specific. It's definitely surprising for C++ code. Do we need it for openmp? If not it seems better to guard the hack with visibility behind if (hip) https://github.com/llvm/llvm-project/pull/68264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Prevent AMDGPU from overriding visibility on DT_nohost variables (PR #68264)
@@ -308,12 +308,13 @@ static bool requiresAMDGPUProtectedVisibility(const Decl *D, if (GV->getVisibility() != llvm::GlobalValue::HiddenVisibility) return false; - return D->hasAttr() || - (isa(D) && D->hasAttr()) || - (isa(D) && - (D->hasAttr() || D->hasAttr() || - cast(D)->getType()->isCUDADeviceBuiltinSurfaceType() || - cast(D)->getType()->isCUDADeviceBuiltinTextureType())); + return !D->hasAttr() && JonChesterfield wrote: is this a spurious whitespace change? https://github.com/llvm/llvm-project/pull/68264 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] c45eaea - [Clang] Undef attribute for global variables
Author: Jon Chesterfield Date: 2020-03-17T21:22:23Z New Revision: c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a URL: https://github.com/llvm/llvm-project/commit/c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a DIFF: https://github.com/llvm/llvm-project/commit/c45eaeabb77a926f4f1cf3c1e9311e9d66e0ee2a.diff LOG: [Clang] Undef attribute for global variables Summary: [Clang] Attribute to allow defining undef global variables Initializing global variables is very cheap on hosted implementations. The C semantics of zero initializing globals work very well there. It is not necessarily cheap on freestanding implementations. Where there is no loader available, code must be emitted near the start point to write the appropriate values into memory. At present, external variables can be declared in C++ and definitions provided in assembly (or IR) to achive this effect. This patch provides an attribute in order to remove this reason for writing assembly for performance sensitive freestanding implementations. A close analogue in tree is LDS memory for amdgcn, where the kernel is responsible for initializing the memory after it starts executing on the gpu. Uninitalized variables in LDS are observably cheaper than zero initialized. Patch is loosely based on the cuda __shared__ and opencl __local variable implementation which also produces undef global variables. Reviewers: kcc, rjmccall, rsmith, glider, vitalybuka, pcc, eugenis, vlad.tsyrklevich, jdoerfert, gregrodgers, jfb, aaron.ballman Reviewed By: rjmccall, aaron.ballman Subscribers: Anastasia, aaron.ballman, davidb, Quuxplusone, dexonsmith, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D74361 Added: clang/test/CodeGen/attr-loader-uninitialized.c clang/test/CodeGenCXX/attr-loader-uninitialized.cpp clang/test/Sema/attr-loader-uninitialized.c clang/test/Sema/attr-loader-uninitialized.cpp Modified: clang/include/clang/Basic/Attr.td clang/include/clang/Basic/AttrDocs.td clang/include/clang/Basic/DiagnosticSemaKinds.td clang/lib/AST/DeclBase.cpp clang/lib/CodeGen/CGDecl.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/test/Misc/pragma-attribute-supported-attributes-list.test Removed: diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 624995a2d572..a0d521d17d0f 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -3313,6 +3313,12 @@ def Uninitialized : InheritableAttr { let Documentation = [UninitializedDocs]; } +def LoaderUninitialized : Attr { + let Spellings = [Clang<"loader_uninitialized">]; + let Subjects = SubjectList<[GlobalVar]>; + let Documentation = [LoaderUninitializedDocs]; +} + def ObjCExternallyRetained : InheritableAttr { let LangOpts = [ObjCAutoRefCount]; let Spellings = [Clang<"objc_externally_retained">]; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index aea574995c8e..60496694200e 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -4358,6 +4358,29 @@ it rather documents the programmer's intent. }]; } +def LoaderUninitializedDocs : Documentation { + let Category = DocCatVariable; + let Content = [{ +The ``loader_uninitialized`` attribute can be placed on global variables to +indicate that the variable does not need to be zero initialized by the loader. +On most targets, zero-initialization does not incur any additional cost. +For example, most general purpose operating systems deliberately ensure +that all memory is properly initialized in order to avoid leaking privileged +information from the kernel or other programs. However, some targets +do not make this guarantee, and on these targets, avoiding an unnecessary +zero-initialization can have a significant impact on load times and/or code +size. + +A declaration with this attribute is a non-tentative definition just as if it +provided an initializer. Variables with this attribute are considered to be +uninitialized in the same sense as a local variable, and the programs must +write to them before reading from them. If the variable's type is a C++ class +type with a non-trivial default constructor, or an array thereof, this attribute +only suppresses the static zero-initialization of the variable, not the dynamic +initialization provided by executing the default constructor. + }]; +} + def CallbackDocs : Documentation { let Category = DocCatFunction; let Content = [{ diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 7cb1eae9615b..f777e0ae4c81 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -5333,6 +5333,17 @@ def ext_aggregate_init_no
[clang] 1d19b15 - Fix arm build broken by D74361 by dropping align from filecheck pattern
Author: Jon Chesterfield Date: 2020-03-17T22:15:19Z New Revision: 1d19b153955a87bd0f83c8a6a072d69239f76d63 URL: https://github.com/llvm/llvm-project/commit/1d19b153955a87bd0f83c8a6a072d69239f76d63 DIFF: https://github.com/llvm/llvm-project/commit/1d19b153955a87bd0f83c8a6a072d69239f76d63.diff LOG: Fix arm build broken by D74361 by dropping align from filecheck pattern Added: Modified: clang/test/CodeGen/attr-loader-uninitialized.c clang/test/CodeGenCXX/attr-loader-uninitialized.cpp Removed: diff --git a/clang/test/CodeGen/attr-loader-uninitialized.c b/clang/test/CodeGen/attr-loader-uninitialized.c index c653d5ba3991..a7fa550fc26d 100644 --- a/clang/test/CodeGen/attr-loader-uninitialized.c +++ b/clang/test/CodeGen/attr-loader-uninitialized.c @@ -1,14 +1,14 @@ // RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s -// CHECK: @tentative_attr_first = global i32 undef, align 4 +// CHECK: @tentative_attr_first = global i32 undef int tentative_attr_first __attribute__((loader_uninitialized)); int tentative_attr_first; -// CHECK: @tentative_attr_second = global i32 undef, align 4 +// CHECK: @tentative_attr_second = global i32 undef int tentative_attr_second; int tentative_attr_second __attribute__((loader_uninitialized)); -// CHECK: @array = global [16 x float] undef, align 16 +// CHECK: @array = global [16 x float] undef float array[16] __attribute__((loader_uninitialized)); typedef struct @@ -17,8 +17,8 @@ typedef struct float y; } s; -// CHECK: @i = global %struct.s undef, align 4 +// CHECK: @i = global %struct.s undef s i __attribute__((loader_uninitialized)); -// CHECK: @private_extern_ok = hidden global i32 undef, align 4 +// CHECK: @private_extern_ok = hidden global i32 undef __private_extern__ int private_extern_ok __attribute__((loader_uninitialized)); diff --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp index ec9d8a54db78..3b401dcf4094 100644 --- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp +++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp @@ -21,9 +21,9 @@ class trivial // CHECK: @ut = global %class.trivial undef trivial ut [[clang::loader_uninitialized]]; -// CHECK: @arr = global [32 x double] undef, align 16 +// CHECK: @arr = global [32 x double] undef double arr[32] __attribute__((loader_uninitialized)); // Defining as arr2[] [[clang..]] raises the error: attribute cannot be applied to types -// CHECK: @arr2 = global [4 x double] undef, align 16 +// CHECK: @arr2 = global [4 x double] undef double arr2 [[clang::loader_uninitialized]] [4]; ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] cc691f3 - Disable loader-uninitialized tests on Windows
Author: Jon Chesterfield Date: 2020-03-17T23:33:12Z New Revision: cc691f3384c593849d3a5ab468d8e5ac6f707dab URL: https://github.com/llvm/llvm-project/commit/cc691f3384c593849d3a5ab468d8e5ac6f707dab DIFF: https://github.com/llvm/llvm-project/commit/cc691f3384c593849d3a5ab468d8e5ac6f707dab.diff LOG: Disable loader-uninitialized tests on Windows Added: Modified: clang/test/CodeGen/attr-loader-uninitialized.c clang/test/CodeGenCXX/attr-loader-uninitialized.cpp Removed: diff --git a/clang/test/CodeGen/attr-loader-uninitialized.c b/clang/test/CodeGen/attr-loader-uninitialized.c index a7fa550fc26d..9ff0c23d77c3 100644 --- a/clang/test/CodeGen/attr-loader-uninitialized.c +++ b/clang/test/CodeGen/attr-loader-uninitialized.c @@ -1,3 +1,4 @@ +// UNSUPPORTED: system-windows // RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s // CHECK: @tentative_attr_first = global i32 undef diff --git a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp index 3b401dcf4094..e82ae47e9f16 100644 --- a/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp +++ b/clang/test/CodeGenCXX/attr-loader-uninitialized.cpp @@ -1,3 +1,4 @@ +// UNSUPPORTED: system-windows // RUN: %clang_cc1 -emit-llvm -o - %s | FileCheck %s // CHECK: @defn = global i32 undef ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] clang-format: SpaceBeforeParens (Always) with overloaded operators
Hi, I believe this is being sent to the correct list. Please let me know if there is a better choice. The clang-format option SpaceBeforeParens "Always" does not insert a space before the opening parenthesis of an overloaded operator function. The attached patch against trunk resolves this. Kind regards, Jon Chesterfield SN Systems - Sony Computer Entertainment Group. Index: lib/Format/TokenAnnotator.cpp === --- lib/Format/TokenAnnotator.cpp (revision 244436) +++ lib/Format/TokenAnnotator.cpp (working copy) @@ -1997,7 +1997,7 @@ if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen)) return true; if (Right.is(TT_OverloadedOperatorLParen)) -return false; +return Style.SpaceBeforeParens == FormatStyle::SBPO_Always; if (Right.is(tok::colon)) { if (Line.First->isOneOf(tok::kw_case, tok::kw_default) || !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi)) Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp (revision 244436) +++ unittests/Format/FormatTest.cpp (working copy) @@ -8276,6 +8276,8 @@ verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace); verifyFormat("int f() throw(Deprecated);", NoSpace); verifyFormat("typedef void (*cb)(int);", NoSpace); + verifyFormat("T A::operator()();",NoSpace); + verifyFormat("X A::operator++(T);",NoSpace); FormatStyle Space = getLLVMStyle(); Space.SpaceBeforeParens = FormatStyle::SBPO_Always; @@ -8321,6 +8323,8 @@ verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space); verifyFormat("int f () throw (Deprecated);", Space); verifyFormat("typedef void (*cb) (int);", Space); + verifyFormat("T A::operator() ();",Space); + verifyFormat("X A::operator++ (T);",Space); } TEST_F(FormatTest, ConfigurableSpacesInParentheses) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D11957: SpaceBeforeParens (Always) with overloaded operators
JonChesterfield created this revision. JonChesterfield added a subscriber: cfe-commits. JonChesterfield set the repository for this revision to rL LLVM. Herald added a subscriber: klimek. The clang-format option SpaceBeforeParens "Always" does not insert a space before the opening parenthesis of an overloaded operator function. The attached patch against trunk resolves this. Repository: rL LLVM http://reviews.llvm.org/D11957 Files: lib/Format/TokenAnnotator.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -8276,6 +8276,8 @@ verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace); verifyFormat("int f() throw(Deprecated);", NoSpace); verifyFormat("typedef void (*cb)(int);", NoSpace); + verifyFormat("T A::operator()();", NoSpace); + verifyFormat("X A::operator++(T);", NoSpace); FormatStyle Space = getLLVMStyle(); Space.SpaceBeforeParens = FormatStyle::SBPO_Always; @@ -8321,6 +8323,8 @@ verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space); verifyFormat("int f () throw (Deprecated);", Space); verifyFormat("typedef void (*cb) (int);", Space); + verifyFormat("T A::operator() ();", Space); + verifyFormat("X A::operator++ (T);", Space); } TEST_F(FormatTest, ConfigurableSpacesInParentheses) { Index: lib/Format/TokenAnnotator.cpp === --- lib/Format/TokenAnnotator.cpp +++ lib/Format/TokenAnnotator.cpp @@ -1997,7 +1997,7 @@ if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen)) return true; if (Right.is(TT_OverloadedOperatorLParen)) -return false; +return Style.SpaceBeforeParens == FormatStyle::SBPO_Always; if (Right.is(tok::colon)) { if (Line.First->isOneOf(tok::kw_case, tok::kw_default) || !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi)) Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -8276,6 +8276,8 @@ verifyFormat("static_assert(sizeof(char) == 1, \"Impossible!\");", NoSpace); verifyFormat("int f() throw(Deprecated);", NoSpace); verifyFormat("typedef void (*cb)(int);", NoSpace); + verifyFormat("T A::operator()();", NoSpace); + verifyFormat("X A::operator++(T);", NoSpace); FormatStyle Space = getLLVMStyle(); Space.SpaceBeforeParens = FormatStyle::SBPO_Always; @@ -8321,6 +8323,8 @@ verifyFormat("static_assert (sizeof (char) == 1, \"Impossible!\");", Space); verifyFormat("int f () throw (Deprecated);", Space); verifyFormat("typedef void (*cb) (int);", Space); + verifyFormat("T A::operator() ();", Space); + verifyFormat("X A::operator++ (T);", Space); } TEST_F(FormatTest, ConfigurableSpacesInParentheses) { Index: lib/Format/TokenAnnotator.cpp === --- lib/Format/TokenAnnotator.cpp +++ lib/Format/TokenAnnotator.cpp @@ -1997,7 +1997,7 @@ if (Right.isOneOf(TT_CtorInitializerColon, TT_ObjCBlockLParen)) return true; if (Right.is(TT_OverloadedOperatorLParen)) -return false; +return Style.SpaceBeforeParens == FormatStyle::SBPO_Always; if (Right.is(tok::colon)) { if (Line.First->isOneOf(tok::kw_case, tok::kw_default) || !Right.getNextNonComment() || Right.getNextNonComment()->is(tok::semi)) ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [PATCH] D11957: SpaceBeforeParens (Always) with overloaded operators
JonChesterfield added a comment. Thanks. I have no commit access. Repository: rL LLVM http://reviews.llvm.org/D11957 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
JonChesterfield wrote: > I don't really like the whole "sufficiently simple function" thing. It seems > fragile. You should be able to just take a arbitrary internal varargs > function, rewrite its signature to take a va_list argument, rewrite calls to > va_start to make a copy of that va_list, and rewrite the callers to construct > that va_list. If that function turns out to be inlinable, great; if not, you > haven't really lost anything. Yes, you can and I do. That's patch 2 of the series, numbered 1. in the list > (Rewriting the signature of a function is complicated in its own way because > you need to allocate a new Function, then transplant the original function's > body into it. But it's not uncharted territory: we should be able to refactor > code out of llvm/lib/Transforms/IPO/ArgumentPromotion.cpp .) > > Do we have a testing plan for this? Messing up calling convention stuff tends > to lead to extremely subtle bugs. And this is why it's a separate patch. The rewrite-call-instruction is the target dependent bit, in this patch for an initial target. The rewrite-function is (almost) target agnostic and involves a surprisingly large amount of book keeping. We have duplicated code in ArgumentPromotion, dead argument removal, the function cloning in attributor and probably elsewhere. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
JonChesterfield wrote: > High level question: Does this patch eliminate the variadic call edge, or, > does it perform inlining on very special variadic function definitions? I > thought the former but `isFunctionInlinable`, sufficiently confused me. This patch will rewrite calls to a variadic function into calls to a function taking a va_list. Later patches expand that to cover the test of the cases. Note that calls to variadic function pointers cannot generally be rewritten without permitting ABI changes, which is what I plan to do for nvptx and amdgpu. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
JonChesterfield wrote: > Not sure if this means isFunctionInlinable will go away in the followup > patch, or if you plan to rewrite functions in a way that satisfies > isFunctionInlinable. I think the end state should be that all functions go > down the same codepath, not conditionally do something different based on > whether they're "simple". I guess I don't have a strong preference for how > you get there, though. The logic I've got at present (which include the ABI rewriting) is ```C++ bool usefulToSplit = splitFunctions() && (!F->isDeclaration() || rewriteABI()); // F may already be a single basic block calling a known function // that takes a va_list, in which case it doens't need to be split. Function *Equivalent = isFunctionInlinable(M, F); if (usefulToSplit && !Equivalent) { Equivalent = DeriveInlinableVariadicFunctionPair(M, *F); assert(Equivalent); assert(isFunctionInlinable(M, *F)); // branch doesn't do this presently but it could do changed = true; functionToInliningTarget[F] = Equivalent; } ``` I'm not especially attached to the specific control flow. The two transforms - inlining/call-rewrite and splitting an entry block off the variadic function function so that it can be inlined - are genuinely orthogonal and that is a really good property to preserve. It took some thought to get to that structure. If we ignore that design and run functions through the block splitting unnecessarily, we win a combinatorial increase in required testing, a decrease in emitted code quality (spurious extra functions), an inability to pattern match on fprintf->vfprintf style code that happens to be in the application already. We would get to delete the isFunctionInlinable predicate. The independent transform pipeline pattern is more important than the no special case branching heuristic. If it helps, view it as two complementary transforms where the one is skipped when it would be a no-op. Related - if there's an objection to landing this as an inactive pass (only exercised by test code) we can put it into an optimisation pipeline immediately, it'll still remove some real world variadic calls even if the later patches don't make it. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)
JonChesterfield wrote: Enable by default without cmake and fedora run their own patch is my preferred solution. The Siemens dev working on gcc amdgpu offloading told me they set rpath on the executable at a conference but I haven't checked their implementation. His attitude was that programs should be able to run without setting environment variables. https://github.com/llvm/llvm-project/pull/82004 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)
JonChesterfield wrote: > IMHO I prefer to ask/request users to do the right thing. One of the drawbacks to asking users to do the "right thing" is that it goes something like: - you must use global state to tell the compiler where the compiler libraries are - you should do this using clang configuration files which are like commandline flags only different - the flag is called rpath, which means runpath, but used to mean rpath, because of this historic context from glibc - plus you need to work out what the (multiple) openmp libraries are called and where they are - and some of them are bitcode, found using this different mechanism - and if you don't get it totally right, things won't work which means we've chosen "right" to mean "trivially convenient for compiler developers who don't like changing things", which is not likely to be what the user had in mind. They should look at this pile of spurious complexity and conclude they don't want to be an openmp user after all. We really don't have a good answer to "why can't my compiler find it's own libraries?", since the best we've got is a reference to a build script Fedora deploy for reasons that they don't really go into in their docs, and where I still haven't managed to find the script itself to reverse engineer what exactly it's rejecting. I'm still happy with a heuristic that amounts to "if the compiler libs are being installed under /usr or /lib, assume the system can find them, and otherwise set rpath on the executable", especially if it's tied to a command line flag or cmake control which lets the user override whichever default we picked for them. https://github.com/llvm/llvm-project/pull/82004 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
@@ -50,31 +50,9 @@ function(collect_object_file_deps target result) endif() endfunction(collect_object_file_deps) -# A rule to build a library from a collection of entrypoint objects. -# Usage: -# add_entrypoint_library( -# DEPENDS -# ) -# -# NOTE: If one wants an entrypoint to be available in a library, then they will -# have to list the entrypoint target explicitly in the DEPENDS list. Implicit -# entrypoint dependencies will not be added to the library. -function(add_entrypoint_library target_name) - cmake_parse_arguments( -"ENTRYPOINT_LIBRARY" -"" # No optional arguments -"" # No single value arguments -"DEPENDS" # Multi-value arguments -${ARGN} - ) - if(NOT ENTRYPOINT_LIBRARY_DEPENDS) -message(FATAL_ERROR "'add_entrypoint_library' target requires a DEPENDS list " -"of 'add_entrypoint_object' targets.") - endif() - - get_fq_deps_list(fq_deps_list ${ENTRYPOINT_LIBRARY_DEPENDS}) +function(get_all_object_file_deps result fq_deps_list) JonChesterfield wrote: This looks like factoring some existing code into a function - if you landed that refactor without changing what the code does, I think it would make this diff much more legible as the current and new code could align https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
JonChesterfield wrote: One large patch may be necessary - is it also necessary to interleave reordering files with changing the contents? It makes the GUI diff tool we're using here essentially useless. If the moving code between files and factoring into functions was a separate commit we'd have a much better chance of seeing what you've changed vs what you've moved. https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
JonChesterfield wrote: OK, worked through this patch now. The noise is substantial but it's an improvement on what we have - the overall impression is that the cmake was originally very special cased for GPUs and now treats them very similarly to other targets, with some careful footwork around compiling the loaders for the host. This is also a step towards making the GPU parts of the cmake easier to edit for non-GPU people which is a strong win. Thank you Michael and JP. Solid effort working through this Joseph. Let's ship it. https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
https://github.com/JonChesterfield approved this pull request. https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
https://github.com/JonChesterfield commented: Stalled on https://github.com/llvm/llvm-project/pull/81557, trying to remove the approve mark as otherwise i'll forget about this https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
https://github.com/JonChesterfield dismissed https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
JonChesterfield wrote: Ah OK, so split every variadic definition and let the inliner sort it out afterwards. Yes, I'm good with that. Tests either get messier or add a call to the inliner. Will update the PR correspondingly, solid simplification, thanks! Discard the combinatorial testing comment - I misunderstood the structure you had in mind. https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [openmp] [OpenMP] Remove `register_requires` global constructor (PR #80460)
https://github.com/JonChesterfield approved this pull request. I like this a lot, thank you. https://github.com/llvm/llvm-project/pull/80460 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
JonChesterfield wrote: This PR is invalid. First, the alignment on the eight byte pointer is supposed to be four. Increasing it to 8 makes things worse. Second, I can't see any support for the claim that the code is incrementing by the alignment of the value, as opposed to the size. The frame is setup as a struct instance with explicit padding by Int8Tys and the calculation there is correct. The va_arg increment is done in CodeGen:emitVoidPtrVAArg, where DirectSize is ValueInfo.Width, aligned to the 4 byte slot size, then stored. It does not increment the iterator by the alignment of the type. The lowering pass is doing exactly what was intended. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -0,0 +1,77 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -emit-llvm -o - %s | FileCheck %s + +extern void varargs_simple(int, ...); + +// CHECK-LABEL: define dso_local void @foo( +// CHECK-SAME: ) #[[ATTR0:[0-9]+]] { +// CHECK-NEXT: [[ENTRY:.*:]] +// CHECK-NEXT:[[C:%.*]] = alloca i8, align 1 +// CHECK-NEXT:[[S:%.*]] = alloca i16, align 2 +// CHECK-NEXT:[[I:%.*]] = alloca i32, align 4 +// CHECK-NEXT:[[L:%.*]] = alloca i64, align 8 +// CHECK-NEXT:[[F:%.*]] = alloca float, align 4 +// CHECK-NEXT:[[D:%.*]] = alloca double, align 8 +// CHECK-NEXT:[[A:%.*]] = alloca [[STRUCT_ANON:%.*]], align 4 +// CHECK-NEXT:[[V:%.*]] = alloca <4 x i32>, align 16 +// CHECK-NEXT:store i8 1, ptr [[C]], align 1 +// CHECK-NEXT:store i16 1, ptr [[S]], align 2 +// CHECK-NEXT:store i32 1, ptr [[I]], align 4 +// CHECK-NEXT:store i64 1, ptr [[L]], align 8 +// CHECK-NEXT:store float 1.00e+00, ptr [[F]], align 4 +// CHECK-NEXT:store double 1.00e+00, ptr [[D]], align 8 +// CHECK-NEXT:[[TMP0:%.*]] = load i8, ptr [[C]], align 1 +// CHECK-NEXT:[[CONV:%.*]] = sext i8 [[TMP0]] to i32 JonChesterfield wrote: C promotes them to i32. C has a lot of rules around vararg type promotion that have not aged brilliantly. If you want a i8 or i16, put it in a struct. C doesn't say anything about promoting that and amdgpu will pass it inlined into the struct, i.e. with no overhead. I believe nvptx will do likewise. https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo { } }; +struct NVPTX final : public VariadicABIInfo { + + bool enableForTarget() override { return true; } + + bool vaListPassedInSSARegister() override { return true; } + + Type *vaListType(LLVMContext &Ctx) override { +return PointerType::getUnqual(Ctx); + } + + Type *vaListParameterType(Module &M) override { +return PointerType::getUnqual(M.getContext()); + } + + Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder, + AllocaInst *, Value *Buffer) override { +return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M)); + } + + VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override { +// NVPTX expects natural alignment in all cases. The variadic call ABI will +// handle promoting types to their appropriate size and alignment. +const unsigned MinAlign = 1; +Align A = DL.getABITypeAlign(Parameter); JonChesterfield wrote: can getABITypeAlign return 0? Does nvptx actually expect natural alignment? That would be inconsistent with the slot size of four which strongly suggests everything is passed with at least four byte alignment https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
https://github.com/JonChesterfield requested changes to this pull request. The amdgpu patch is incorrect, see https://github.com/llvm/llvm-project/pull/96370/ The nvptx lowering looks dubious - values smaller than slot size should be passed with the same alignment as the slot and presently aren't. A struct containing i8, i16 or half should be miscompiled on nvptx as written. No comment on the libc part. https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
https://github.com/JonChesterfield requested changes to this pull request. Patch should not land. Need to know what bug this was trying to address to guess at what the right fix would be. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
JonChesterfield wrote: Ah yes, libc code doing the equivalent of va_arg assuming natural alignment when the underlying buffer is a packed struct with fields padded to four bytes would not work. That would be "fixed" by changing the compiler to match the assumption made by libc, but it seems much better for libc to do the misaligned load instead. Then there's no wavesize*align-padding stack space burned at runtime. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
JonChesterfield wrote: I've passed some types to nvcc on godbolt and tried to decode the results. It looks like it's passing everything with natural alignment, flattened, with total disregard to the minimum slot size premise clang uses. https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -215,7 +219,10 @@ void NVPTXABIInfo::computeInfo(CGFunctionInfo &FI) const { RValue NVPTXABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr, QualType Ty, AggValueSlot Slot) const { - llvm_unreachable("NVPTX does not support varargs"); + return emitVoidPtrVAArg(CGF, VAListAddr, Ty, /*IsIndirect=*/false, + getContext().getTypeInfoInChars(Ty), + CharUnits::fromQuantity(4), JonChesterfield wrote: Error is here - this says slots shall be at least four bytes in size, but nvcc looks happy to pass struct {char} right next to other things, so we're looking for CharUnits::fromQuantity(1), https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo { } }; +struct NVPTX final : public VariadicABIInfo { + + bool enableForTarget() override { return true; } + + bool vaListPassedInSSARegister() override { return true; } + + Type *vaListType(LLVMContext &Ctx) override { +return PointerType::getUnqual(Ctx); + } + + Type *vaListParameterType(Module &M) override { +return PointerType::getUnqual(M.getContext()); + } + + Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder, + AllocaInst *, Value *Buffer) override { +return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M)); + } + + VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override { +// NVPTX expects natural alignment in all cases. The variadic call ABI will +// handle promoting types to their appropriate size and alignment. +const unsigned MinAlign = 1; +Align A = DL.getABITypeAlign(Parameter); JonChesterfield wrote: I suspect Matt means `Align MinAlign = 1;` instead of `unsigned MinAlign = ` https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
https://github.com/JonChesterfield edited https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo { } }; +struct NVPTX final : public VariadicABIInfo { + + bool enableForTarget() override { return true; } + + bool vaListPassedInSSARegister() override { return true; } + + Type *vaListType(LLVMContext &Ctx) override { +return PointerType::getUnqual(Ctx); + } + + Type *vaListParameterType(Module &M) override { +return PointerType::getUnqual(M.getContext()); + } + + Value *initializeVaList(Module &M, LLVMContext &Ctx, IRBuilder<> &Builder, + AllocaInst *, Value *Buffer) override { +return Builder.CreateAddrSpaceCast(Buffer, vaListParameterType(M)); + } + + VAArgSlotInfo slotInfo(const DataLayout &DL, Type *Parameter) override { +// NVPTX expects natural alignment in all cases. The variadic call ABI will +// handle promoting types to their appropriate size and alignment. +const unsigned MinAlign = 1; +Align A = DL.getABITypeAlign(Parameter); +if (A < MinAlign) + A = Align(MinAlign); +return {A, false}; + } JonChesterfield wrote: Are you really looking for `if (A < 1) A = 1;` here? Align has max/min functions which would be clearer https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -54,7 +54,34 @@ class MockArgList { } template LIBC_INLINE T next_var() { -++arg_counter; +arg_counter++; +return T(arg_counter); + } + + size_t read_count() const { return arg_counter; } +}; + +// Used by the GPU implementation to parse how many bytes need to be read from +// the variadic argument buffer. +class DummyArgList { + size_t arg_counter = 0; + +public: + LIBC_INLINE DummyArgList() = default; + LIBC_INLINE DummyArgList(va_list) { ; } + LIBC_INLINE DummyArgList(DummyArgList &other) { +arg_counter = other.arg_counter; + } + LIBC_INLINE ~DummyArgList() = default; + + LIBC_INLINE DummyArgList &operator=(DummyArgList &rhs) { +arg_counter = rhs.arg_counter; +return *this; + } + + template LIBC_INLINE T next_var() { +arg_counter = +((arg_counter + alignof(T) - 1) / alignof(T)) * alignof(T) + sizeof(T); return T(arg_counter); JonChesterfield wrote: maybe split this into separate increment by size and round for the alignment operations. There might be helper functions for the rounding which would be clearer than reading the division / multiplication and trying to pattern match on what alignment code usually looks like (e.g. I expected to see masking off bits so tripped over this) https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)
@@ -116,8 +116,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo { } BuiltinVaListKind getBuiltinVaListKind() const override { -// FIXME: implement -return TargetInfo::CharPtrBuiltinVaList; +return TargetInfo::VoidPtrBuiltinVaList; JonChesterfield wrote: These should be the same as far as codegen is concerned I think, in which case we should probably leave it unchanged. Or is there a reason to change it I'm missing? https://github.com/llvm/llvm-project/pull/96369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Sanitizer] Make sanitizer passes idempotent (PR #99439)
JonChesterfield wrote: Sanizer passes setting a "no sanitizer" magic variable is backwards. If this behaviour is the way to go, have clang set a "needs_asan_lowering" or whatever and have the corresponding pass remove it. It shouldn't be necessary to emit ever increasing lists of pass and target specific cruft in the IR to avoid miscompilation. The opposite way round is much better - compile correctly when the flag is missing, and add the ad hoc metadata to switch on non-default behaviour https://github.com/llvm/llvm-project/pull/99439 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits