[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
arsenm wrote: > Think it would be useful to put that on functions in the wrapper headers that > definitely aren't convergent? E.g. getting a thread id. You could, but it's trivially inferable in those cases anyway https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
@@ -4106,9 +4106,10 @@ bool CompilerInvocation::ParseLangArgs(LangOptions &Opts, ArgList &Args, Opts.Blocks = Args.hasArg(OPT_fblocks) || (Opts.OpenCL && Opts.OpenCLVersion == 200); - Opts.ConvergentFunctions = Args.hasArg(OPT_fconvergent_functions) || - Opts.OpenCL || (Opts.CUDA && Opts.CUDAIsDevice) || - Opts.SYCLIsDevice || Opts.HLSL; + Opts.ConvergentFunctions = Args.hasFlag( + OPT_fconvergent_functions, OPT_fno_convergent_functions, + Opts.OpenMPIsTargetDevice || T.isAMDGPU() || T.isNVPTX() || Opts.OpenCL || + Opts.CUDAIsDevice || Opts.SYCLIsDevice || Opts.HLSL); arsenm wrote: Sort all the language checks together, before the target list. We probably should have a hasConvergentOperations() predicate somewhere https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)
arsenm wrote: > -fno-convergent-functions to opt-out if you want to test broken behavior. You may legitimately know there are no convergent functions in the TU. We also have the noconvergent source attribute now for this https://github.com/llvm/llvm-project/pull/111076 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)
@@ -0,0 +1,187 @@ +//===-- amdgpuintrin.h - AMDPGU intrinsic functions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef __AMDGPUINTRIN_H +#define __AMDGPUINTRIN_H + +#ifndef __AMDGPU__ +#error "This file is intended for AMDGPU targets or offloading to AMDGPU +#endif + +#include +#include + +#if defined(__HIP__) || defined(__CUDA__) +#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline)) +#else +#define _DEFAULT_ATTRS __attribute__((always_inline)) +#endif + +#pragma omp begin declare target device_type(nohost) +#pragma omp begin declare variant match(device = {arch(amdgcn)}) + +// Type aliases to the address spaces used by the AMDGPU backend. +#define _private __attribute__((opencl_private)) +#define _constant __attribute__((opencl_constant)) +#define _local __attribute__((opencl_local)) +#define _global __attribute__((opencl_global)) + +// Attribute to declare a function as a kernel. +#define _kernel __attribute__((amdgpu_kernel, visibility("protected"))) + +// Returns the number of workgroups in the 'x' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() { + return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workgroups in the 'y' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() { + return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workgroups in the 'z' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() { + return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workgruops in the grid. +_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() { + return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z(); +} + +// Returns the 'x' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() { + return __builtin_amdgcn_workgroup_id_x(); +} + +// Returns the 'y' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() { + return __builtin_amdgcn_workgroup_id_y(); +} + +// Returns the 'z' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() { + return __builtin_amdgcn_workgroup_id_z(); +} + +// Returns the absolute id of the AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_block_id() { + return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() + + _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z(); +} + +// Returns the number of workitems in the 'x' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() { + return __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workitems in the 'y' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() { + return __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workitems in the 'z' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() { + return __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workitems in the workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_num_threads() { + return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z(); +} + +// Returns the 'x' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() { + return __builtin_amdgcn_workitem_id_x(); +} + +// Returns the 'y' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() { + return __builtin_amdgcn_workitem_id_y(); +} + +// Returns the 'z' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() { + return __builtin_amdgcn_workitem_id_z(); +} + +// Returns the absolute id of the thread in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_thread_id() { + return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() + + _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z(); +} + +// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware +// and compilation options. +_DEFAULT_ATTRS static inline uint32_t _get_lane_size() { + return __builtin_amdgcn_wavefrontsize(); +} + +// Returns the id of the thread inside of an AMD wavefront executing together. +_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() { + return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)); +} + +// Returns the bit-mask of active threads in
[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)
@@ -0,0 +1,187 @@ +//===-- amdgpuintrin.h - AMDPGU intrinsic functions ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef __AMDGPUINTRIN_H +#define __AMDGPUINTRIN_H + +#ifndef __AMDGPU__ +#error "This file is intended for AMDGPU targets or offloading to AMDGPU +#endif + +#include +#include + +#if defined(__HIP__) || defined(__CUDA__) +#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline)) +#else +#define _DEFAULT_ATTRS __attribute__((always_inline)) +#endif + +#pragma omp begin declare target device_type(nohost) +#pragma omp begin declare variant match(device = {arch(amdgcn)}) + +// Type aliases to the address spaces used by the AMDGPU backend. +#define _private __attribute__((opencl_private)) +#define _constant __attribute__((opencl_constant)) +#define _local __attribute__((opencl_local)) +#define _global __attribute__((opencl_global)) + +// Attribute to declare a function as a kernel. +#define _kernel __attribute__((amdgpu_kernel, visibility("protected"))) + +// Returns the number of workgroups in the 'x' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() { + return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workgroups in the 'y' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() { + return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workgroups in the 'z' dimension of the grid. +_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() { + return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workgruops in the grid. +_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() { + return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z(); +} + +// Returns the 'x' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() { + return __builtin_amdgcn_workgroup_id_x(); +} + +// Returns the 'y' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() { + return __builtin_amdgcn_workgroup_id_y(); +} + +// Returns the 'z' dimension of the current AMD workgroup's id. +_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() { + return __builtin_amdgcn_workgroup_id_z(); +} + +// Returns the absolute id of the AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_block_id() { + return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() + + _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z(); +} + +// Returns the number of workitems in the 'x' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() { + return __builtin_amdgcn_workgroup_size_x(); +} + +// Returns the number of workitems in the 'y' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() { + return __builtin_amdgcn_workgroup_size_y(); +} + +// Returns the number of workitems in the 'z' dimension. +_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() { + return __builtin_amdgcn_workgroup_size_z(); +} + +// Returns the total number of workitems in the workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_num_threads() { + return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z(); +} + +// Returns the 'x' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() { + return __builtin_amdgcn_workitem_id_x(); +} + +// Returns the 'y' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() { + return __builtin_amdgcn_workitem_id_y(); +} + +// Returns the 'z' dimension id of the workitem in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() { + return __builtin_amdgcn_workitem_id_z(); +} + +// Returns the absolute id of the thread in the current AMD workgroup. +_DEFAULT_ATTRS static inline uint64_t _get_thread_id() { + return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() + + _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z(); +} + +// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware +// and compilation options. +_DEFAULT_ATTRS static inline uint32_t _get_lane_size() { + return __builtin_amdgcn_wavefrontsize(); +} + +// Returns the id of the thread inside of an AMD wavefront executing together. +_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() { arsenm wrote: We should really just rip out the convergent source attribute. We should only have noconvergent
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) +return std::pair(Ptr, AddressSpace::CrossWorkgroup); + + return std::pair(nullptr, UINT32_MAX); +} arsenm wrote: This is the fancy stuff that should go into a follow up patch to add assume support https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { arsenm wrote: Move to separate change, not sure this is necessarily valid for spirv https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -178,6 +266,9 @@ void SPIRVPassConfig::addIRPasses() { addPass(createSPIRVStructurizerPass()); } + if (TM.getOptLevel() > CodeGenOptLevel::None) +addPass(createInferAddressSpacesPass(AddressSpace::Generic)); arsenm wrote: Not sure why this is a pass parameter to InferAddressSpaces, and a TTI hook https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) arsenm wrote: Shouldn't be looking at the amdgcn intrinsics? Surely spirv has its own operations for this? https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, const Triple &TT, setRequiresStructuredCFG(false); } +enum AddressSpace { + Function = storageClassToAddressSpace(SPIRV::StorageClass::Function), + CrossWorkgroup = + storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup), + UniformConstant = + storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant), + Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup), + Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic) +}; + +unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const { + const auto *LD = dyn_cast(V); + if (!LD) +return UINT32_MAX; + + // It must be a load from a pointer to Generic. + assert(V->getType()->isPointerTy() && + V->getType()->getPointerAddressSpace() == AddressSpace::Generic); + + const auto *Ptr = LD->getPointerOperand(); + if (Ptr->getType()->getPointerAddressSpace() != AddressSpace::UniformConstant) +return UINT32_MAX; + // For a loaded from a pointer to UniformConstant, we can infer CrossWorkgroup + // storage, as this could only have been legally initialised with a + // CrossWorkgroup (aka device) constant pointer. + return AddressSpace::CrossWorkgroup; +} + +std::pair +SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const { + using namespace PatternMatch; + + if (auto *II = dyn_cast(V)) { +switch (II->getIntrinsicID()) { +case Intrinsic::amdgcn_is_shared: + return std::pair(II->getArgOperand(0), AddressSpace::Workgroup); +case Intrinsic::amdgcn_is_private: + return std::pair(II->getArgOperand(0), AddressSpace::Function); +default: + break; +} +return std::pair(nullptr, UINT32_MAX); + } + // Check the global pointer predication based on + // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and + // the order of 'is_shared' and 'is_private' is not significant. + Value *Ptr; + if (getTargetTriple().getVendor() == Triple::VendorType::AMD && + match( + const_cast(V), + m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))), +m_Not(m_Intrinsic( +m_Deferred(Ptr)) +return std::pair(Ptr, AddressSpace::CrossWorkgroup); + + return std::pair(nullptr, UINT32_MAX); +} + +bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS, + unsigned DestAS) const { + if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup) +return false; + return DestAS == AddressSpace::Generic || + DestAS == AddressSpace::CrossWorkgroup; +} arsenm wrote: This is separate, I don't think InferAddressSpaces relies on this https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)
@@ -0,0 +1,29 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py arsenm wrote: You don't need to duplicate all of these tests. You just need some basic samples that the target is implemented, the full set is testing pass mechanics which can be done on any target https://github.com/llvm/llvm-project/pull/110897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NFC][TableGen] Change `Record::getSuperClasses` to use const Record* (PR #110845)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110845 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110747 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)
@@ -1660,7 +1660,7 @@ class Record { // this record. SmallVector Locs; SmallVector ForwardDeclarationLocs; - SmallVector ReferenceLocs; + mutable SmallVector ReferenceLocs; arsenm wrote: You have the const_cast on the addition, so this is unnecessary? https://github.com/llvm/llvm-project/pull/110747 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: The codegen prepare behavior is still backend code to be tested. You can just run codegenprepare as a standalone pass too (usually would have separate llc and opt run lines in such a test) https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)
@@ -0,0 +1,102 @@ +//===-- LLVMTargetMachineC.cpp ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file implements the LLVM-C part of TargetMachine.h that directly +// depends on the CodeGen library. +// +//===--===// + +#include "llvm-c/Core.h" +#include "llvm-c/TargetMachine.h" +#include "llvm/CodeGen/MachineModuleInfo.h" +#include "llvm/IR/LegacyPassManager.h" +#include "llvm/IR/Module.h" +#include "llvm/Support/FileSystem.h" +#include "llvm/Support/raw_ostream.h" +#include "llvm/Target/TargetMachine.h" + +using namespace llvm; + +static TargetMachine *unwrap(LLVMTargetMachineRef P) { + return reinterpret_cast(P); +} + +static Target *unwrap(LLVMTargetRef P) { return reinterpret_cast(P); } + +static LLVMTargetMachineRef wrap(const TargetMachine *P) { + return reinterpret_cast(const_cast(P)); +} + +static LLVMTargetRef wrap(const Target *P) { + return reinterpret_cast(const_cast(P)); +} + +static LLVMBool LLVMTargetMachineEmit(LLVMTargetMachineRef T, LLVMModuleRef M, + raw_pwrite_stream &OS, + LLVMCodeGenFileType codegen, + char **ErrorMessage) { + TargetMachine *TM = unwrap(T); + Module *Mod = unwrap(M); + + legacy::PassManager pass; + MachineModuleInfo MMI(static_cast(TM)); + + std::string error; + + Mod->setDataLayout(TM->createDataLayout()); + + CodeGenFileType ft; + switch (codegen) { + case LLVMAssemblyFile: +ft = CodeGenFileType::AssemblyFile; +break; + default: +ft = CodeGenFileType::ObjectFile; +break; + } + if (TM->addPassesToEmitFile(pass, MMI, OS, nullptr, ft)) { +error = "TargetMachine can't emit a file of this type"; +*ErrorMessage = strdup(error.c_str()); +return true; + } + + pass.run(*Mod); + + OS.flush(); + return false; +} + +LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M, + const char *Filename, + LLVMCodeGenFileType codegen, + char **ErrorMessage) { + std::error_code EC; + raw_fd_ostream dest(Filename, EC, sys::fs::OF_None); + if (EC) { +*ErrorMessage = strdup(EC.message().c_str()); +return true; + } + bool Result = LLVMTargetMachineEmit(T, M, dest, codegen, ErrorMessage); + dest.flush(); + return Result; +} + +LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T, + LLVMModuleRef M, + LLVMCodeGenFileType codegen, + char **ErrorMessage, + LLVMMemoryBufferRef *OutMemBuf) { + SmallString<0> CodeString; + raw_svector_ostream OStream(CodeString); + bool Result = LLVMTargetMachineEmit(T, M, OStream, codegen, ErrorMessage); + + StringRef Data = OStream.str(); + *OutMemBuf = + LLVMCreateMemoryBufferWithMemoryRangeCopy(Data.data(), Data.size(), ""); + return Result; +} arsenm wrote: Missing newline at end of file https://github.com/llvm/llvm-project/pull/110443 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
arsenm wrote: > with the PR pulled in (on top of LLVM's HEAD > [aadfba9](https://github.com/llvm/llvm-project/commit/aadfba9b2aa107f9cada2fd9bcbe612cbf560650)), > the compilation command is: `clang++ -cl-std=CL2.0 -emit-llvm -c -x cl -g0 > --target=spir -Xclang -finclude-default-header -O2 test.cl` The output LLVM > IR after the optimizations is: You want spirv, not spir > note bitcast to i128 with the following truncation to i96 - those types > aren't part of the datalayout, yet some optimization generated them. So > something has to be done with it and changing the datalayout is not enough. Any pass is allowed to introduce any IR type. This field is a pure optimization hint. It is not required to do anything, and places no restrictions on any pass > > > This does not mean arbitrary integer bitwidths do not work. The n field is > > weird, it's more of an optimization hint. > > And I can imagine that we would want to not only be able to emit 4-bit > integers in the frontend, but also allow optimization passes to emit them. Just because there's an extension doesn't mean it's desirable to use them. On real targets, they'll end up codegenning in wider types anyway https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
arsenm wrote: > 1. Usually (or at least AFAIK) optimization passes won't consider datalayout > automatically, The datalayout is a widely used global constant. There's no option of "not considering it" > Do you plan to go over LLVM passes adding this check? There's nothing new to do here. This has always existed > 2. Some existing and future extensions might allow extra bit widths for > integers. This does not mean arbitrary integer bitwidths do not work. The n field is weird, it's more of an optimization hint. https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: > Right but it's relying on a non-guaranteed maybe-optimisation firing, as far > as I can tell. The point is to test the optimization does work. The codegen pipeline is a bunch of intertwined IR passes on top of core codegen, and they need to cooperate https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > > I would like to avoid adding additional special properties to AS0, or > > defining the flat concept. > > How can we add a new specification w/o defining it? By not defining it in terms of flat addressing. Just make it the undesirable address space https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) { // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't // mean anything. if (Arch == Triple::spirv32) -return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-" + "v256:256-v512:512-v1024:1024-n8:16:32:64-G1"; if (TT.getVendor() == Triple::VendorType::AMD && TT.getOS() == Triple::OSType::AMDHSA) -return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"; - return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-" + "v512:512-v1024:1024-n32:64-S32-G1-P4-A0"; arsenm wrote: AMDGPU sets S32 now, which isn't wrong. But the rest of codegen assumes 16-byte alignment by default https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: That is not the nature of this kind of test https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110506 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: This one is testing codegenprepare as part of the normal codegen pipeline, so this one is fine. The other case was a full optimization pipeline + codegen, which are more far removed https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -1,56 +0,0 @@ -; This test aims to check ability to support "Arithmetic with Overflow" intrinsics arsenm wrote: Not sure what the problem is with this test, but it's already covered by another? https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)
@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) { // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't // mean anything. if (Arch == Triple::spirv32) -return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-" + "v256:256-v512:512-v1024:1024-n8:16:32:64-G1"; if (TT.getVendor() == Triple::VendorType::AMD && TT.getOS() == Triple::OSType::AMDHSA) -return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"; - return "e-i64:64-v16:16-v24:32-v32:32-v48:64-" - "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1"; +return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-" + "v512:512-v1024:1024-n32:64-S32-G1-P4-A0"; arsenm wrote: The stack alignment should be 16 bytes (S128), but that's not mentioned in the description. Do this separately? I'm pretty sure this is wrong for the amdgcn triples too https://github.com/llvm/llvm-project/pull/110695 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)
@@ -1122,6 +1122,26 @@ define void @fastMathFlagsForArrayCalls([2 x float] %f, [2 x double] %d1, [2 x < ret void } +declare { float, float } @fmf_struct_f32() +declare { double, double } @fmf_struct_f64() +declare { <4 x double>, <4 x double> } @fmf_struct_v4f64() + +; CHECK-LABEL: fastMathFlagsForStructCalls( +define void @fastMathFlagsForStructCalls({ float, float } %f, { double, double } %d1, { <4 x double>, <4 x double> } %d2) { + %call.fast = call fast { float, float } @fmf_struct_f32() + ; CHECK: %call.fast = call fast { float, float } @fmf_struct_f32() + + ; Throw in some other attributes to make sure those stay in the right places. + + %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64() + ; CHECK: %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64() + + %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } @fmf_struct_v4f64() + ; CHECK: %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } @fmf_struct_v4f64() + arsenm wrote: Can you also add a test with nofpclass attributes on the return / argument? The intent was it would be allowed for the same types as FPMathOperator https://github.com/llvm/llvm-project/pull/110506 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)
arsenm wrote: > * Move the MC emission functions in `TargetMachine` to `LLVMTargetMachine`. > With the changes in this PR, we explicitly assume in both > `addPassesToEmitFile` and `addPassesToEmitMC` that the `TargetMachine` is an > `LLVMTargetMachine`; Hence it does not make sense for these functions to be > present in the `TargetMachine` interface. Was this already implicitly assumed? IIRC there was some layering reason why this is the way it was. There were previous attempts to merge these before, which were abandoned: https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html https://reviews.llvm.org/D38482 https://reviews.llvm.org/D38489 https://github.com/llvm/llvm-project/pull/110443 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/110032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: > With the constrained intrinsics the default is safe because optimizations > don't recognize the constrained intrinsic and thus don't know how to optimize > it. If we instead rely on the strictfp attribute then we'll need possibly > thousands of checks for this attribute, we'll need everyone going forward to > remember to check for it, and we'll have no way to verify that this rule is > being followed. The current state already requires you to check this for any library calls. Not sure any wide audit of those ever happened. I don't see a better alternative to cover those, plus the full set of target intrinsics. https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)
@@ -1556,7 +1557,7 @@ class RecordVal { bool IsUsed = false; /// Reference locations to this record value. - SmallVector ReferenceLocs; + mutable SmallVector ReferenceLocs; arsenm wrote: Is this removed in later patches? https://github.com/llvm/llvm-project/pull/110032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
@@ -273,6 +273,74 @@ void test_builtin_elementwise_min(int i, short s, double d, float4 v, int3 iv, u // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was '_Complex float')}} } +void test_builtin_elementwise_maximum(int i, short s, float f, double d, float4 v, int3 iv, unsigned3 uv, int *p) { + i = __builtin_elementwise_maximum(p, d); + // expected-error@-1 {{arguments are of different types ('int *' vs 'double')}} + + struct Foo foo = __builtin_elementwise_maximum(d, d); + // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}} + + i = __builtin_elementwise_maximum(i); + // expected-error@-1 {{too few arguments to function call, expected 2, have 1}} + + i = __builtin_elementwise_maximum(); + // expected-error@-1 {{too few arguments to function call, expected 2, have 0}} + + i = __builtin_elementwise_maximum(i, i, i); + // expected-error@-1 {{too many arguments to function call, expected 2, have 3}} + + i = __builtin_elementwise_maximum(v, iv); + // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}} + + i = __builtin_elementwise_maximum(uv, iv); + // expected-error@-1 {{arguments are of different types ('unsigned3' (vector of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}} + + d = __builtin_elementwise_maximum(d, f); + + v = __builtin_elementwise_maximum(v, v); + + int A[10]; + A = __builtin_elementwise_maximum(A, A); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was 'int *')}} + + _Complex float c1, c2; + c1 = __builtin_elementwise_maximum(c1, c2); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was '_Complex float')}} +} + +void test_builtin_elementwise_minimum(int i, short s, float f, double d, float4 v, int3 iv, unsigned3 uv, int *p) { + i = __builtin_elementwise_minimum(p, d); + // expected-error@-1 {{arguments are of different types ('int *' vs 'double')}} + + struct Foo foo = __builtin_elementwise_minimum(d, d); + // expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'double'}} + + i = __builtin_elementwise_minimum(i); + // expected-error@-1 {{too few arguments to function call, expected 2, have 1}} + + i = __builtin_elementwise_minimum(); + // expected-error@-1 {{too few arguments to function call, expected 2, have 0}} + + i = __builtin_elementwise_minimum(i, i, i); + // expected-error@-1 {{too many arguments to function call, expected 2, have 3}} + + i = __builtin_elementwise_minimum(v, iv); + // expected-error@-1 {{arguments are of different types ('float4' (vector of 4 'float' values) vs 'int3' (vector of 3 'int' values))}} + + i = __builtin_elementwise_minimum(uv, iv); + // expected-error@-1 {{arguments are of different types ('unsigned3' (vector of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}} + + d = __builtin_elementwise_minimum(f, d); + + int A[10]; + A = __builtin_elementwise_minimum(A, A); + // expected-error@-1 {{1st argument must be a vector, integer or floating point type (was 'int *')}} arsenm wrote: The codegen assumes this is only floating point, so the integer part of the message is wrong. Also missing tests using 2 arguments with only integer / vector of integer https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)
@@ -706,6 +706,12 @@ Unless specified otherwise operation(±0) = ±0 and operation(±infinity) = ±in representable values for the signed/unsigned integer type. T __builtin_elementwise_sub_sat(T x, T y) return the difference of x and y, clamped to the range ofinteger types representable values for the signed/unsigned integer type. + T __builtin_elementwise_maximum(T x, T y) return x or y, whichever is larger. If exactly one argument is integer and floating point types + a NaN, return the other argument. If both arguments are NaNs, arsenm wrote: This doesn't fully explain the semantics, and I'd like to avoid trying to re-explain all the details in every instance of this. Can you just point this to some other description of the semantics? https://github.com/llvm/llvm-project/pull/110198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [cuda][[HIP] `__constant__` should imply constant (PR #110182)
arsenm wrote: If it's not legal for it to be marked as constant, it's also not legal to use constant address space https://github.com/llvm/llvm-project/pull/110182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > Both in InferAddressSpaces, and in Attributor, you don't really care about > whether a flat address-space exists. Right, this is more of an undesirable address space. Optimizations don't need to know anything about its behavior beyond that. > In reply to your question above re whether this is a DL or a Target property, > I don't have a strong opinion there (it appears @shiltian and @arsenm might). I don't really like putting this in the DataLayout. My original idea was to move it to TargetMachine, but we want to avoid the dependence on CodeGen. The DataLayout is just the other place we have that defines module level target information. The simple solution is just have a switch over the target architecture in Attributor. > I do believe that this is a necessary bit of query-able information, > especially from a Clang, for correctness reasons (more on that below). I don't think this buys frontends much. Clang still needs to understand the full language address space -> target address space mapping. This would just allow populating one entry generically > Ah, this is part of the challenge - we do indeed assume that 0 is flat, but > Targets aren't bound by LangRef to use 0 to denote flat (and some, like SPIR > / SPIR-V) do not As I mentioned above, SPIRV can just work its way out of this problem for its IR. SPIR's only reason for existence is bitcode compatibility, so doing anything with there will be quite a lot of work which will never realistically happen. > I'm fine with adding the enforcement in LLVM that AS0 needs to be the flat > AS, if a target has it, but the definition of a flat AS still needs to be > set. If we do that, how will SPIR/SPIR-V work? > This is the most generic wording I can come up with so far. Happy to hear > more feedbacks. I would like to avoid adding additional special properties to AS0, or defining the flat concept. https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) { "-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-" "v32:32-v48:64-v96:" "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-" - "G1-ni:7:8:9"; + "G1-ni:7:8:9-T0"; arsenm wrote: No, but yes. We probably should just define 0 to be the flat address space and take the same numbers as amdgcn. Flat will just be unsupported in codegen (but theoretically someone could go implement software tagged pointers) https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > Just to clarify, does this mean any two non-flat address space pointers > _cannot_ alias? This should change nothing about aliasing. The IR assumption is any address space may alias any other https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
arsenm wrote: > There are targets that use a different integer to denote flat (e.g. see SPIR > & SPIR-V). Whilst I know that there are objections to that, the fact remains > that they had historical reason (wanted to make legacy OCL convention that > the default is private work, and given that IR defaults to 0 this was an > easy, if possibly costly, way out; The SPIRV IR would be better off changing its numbers around like we did in AMDGPU ages ago. The only concern would be bitcode compatibility, but given it's still an "experimental target" that shouldn't be an issue. > AMDGPU also borks this for legacy OCL reasons, which has been a source of > pain). This is only a broken in-clang hack, the backend IR always uses the correct address space https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)
@@ -66,12 +66,12 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple, HasFloat16 = true; if (TargetPointerWidth == 32) -resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"); +resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0"); arsenm wrote: It is https://github.com/llvm/llvm-project/pull/108786 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
https://github.com/arsenm approved this pull request. I think we need more thought about how the ABI for this will work, but we need to start somewhere https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: > If we can't keep the constrained semantics and near-100% guarantee that no > new exceptions will be introduced then operand bundles are not a replacement > for the constrained intrinsics. We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +for (Instruction &I : instructions(F)) { + if (isa(I) && arsenm wrote: Simple example, where the cast is still directly the operand. It could be further nested inside another constant expression https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +for (Instruction &I : instructions(F)) { + if (isa(I) && arsenm wrote: 5->3 is an illegal address space cast, but the round trip cast can fold away. You don't want the cast back to the original address space. https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
arsenm wrote: Also it's silly that we need to do bitcode autoupgrade of "experimental" intrinsics, but x86 started shipping with strictfp enabled in production before they graduated. We might as well drop the experimental bit then https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)
@@ -357,6 +357,9 @@ class IRBuilderBase { void setConstrainedFPCallAttr(CallBase *I) { I->addFnAttr(Attribute::StrictFP); +MemoryEffects ME = MemoryEffects::inaccessibleMemOnly(); arsenm wrote: It shouldn't be necessary to touch the attributes. The set of intrinsic attributes are fixed (callsite attributes are another thing, but generally should be droppable here) https://github.com/llvm/llvm-project/pull/109798 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
@@ -78,15 +78,15 @@ void MCResourceInfo::finalize(MCContext &OutContext) { } MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { - return OutContext.getOrCreateSymbol("max_num_vgpr"); + return OutContext.getOrCreateSymbol("amdgcn.max_num_vgpr"); arsenm wrote: We're usually using amdgpu instead of amdgcn in new fields https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Use std::optional::value_or (NFC) (PR #109894)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/109894 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
arsenm wrote: Superseded by #108853 https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/108853 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
arsenm wrote: > If we already have per-function metadata, I'm wondering how difficult it > would be to put this handling in the linker. AFAIK there's already handling > for `call-graph-profile` which can inform the linker of the call-graph, so we > could potentially just walk that graph, find the diameter of the register > usage and then emit it in the final HSA metadata. There would still be the > issue of LDS usage, but we could probably just state that LDS used by a > kernel outside the current TU doesn't work for starters. That would be the ultimate goal. We need to think harder about what the final ABI looks like, instead of creating a new symbol for every individual field https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)
https://github.com/arsenm approved this pull request. LGTM, but like I mentioned on #107598, it would be good if there was a test that requires the argument check, and the return check isn't sufficient https://github.com/llvm/llvm-project/pull/108853 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libclc] [libclc] use default paths with find_program when possible (PR #105969)
arsenm wrote: > So it should be built along with the core of LLVM? Also, we package LLVM per > version per subproject. Yes, it should be built along with the core (but doesn't need to ship in the same package as the core). https://github.com/llvm/llvm-project/pull/105969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libclc] [libclc] use default paths with find_program when possible (PR #105969)
arsenm wrote: > Nixpkgs has no intention of moving away from standalone builds. I encourage you to acquire that intention. IMO libclc should not support the standalone build, and this should be version locked to the exact compiler commit. It's compiler data, not a real library https://github.com/llvm/llvm-project/pull/105969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libclc] [libclc] use default paths with find_program when possible (PR #105969)
https://github.com/arsenm commented: The nix build should probably migrate to using the non-standalone build https://github.com/llvm/llvm-project/pull/105969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libclc] [libclc] use default paths with find_program when possible (PR #105969)
@@ -55,7 +55,7 @@ if( LIBCLC_STANDALONE_BUILD OR CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DI # Import required tools if( NOT EXISTS ${LIBCLC_CUSTOM_LLVM_TOOLS_BINARY_DIR} ) foreach( tool IN ITEMS clang llvm-as llvm-link opt ) - find_program( LLVM_TOOL_${tool} ${tool} PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH ) +find_program( LLVM_TOOL_${tool} ${tool} PATHS ${LLVM_TOOLS_BINARY_DIR} ) arsenm wrote: Why does this need to find any binary? Can't it just use the imported targets from the find_package(LLVM) above? https://github.com/llvm/llvm-project/pull/105969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libclc] [libclc] use default paths with find_program when possible (PR #105969)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/105969 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make MMIWP not have ownership over MMI + Make MMI Only Use an External MCContext (PR #105541)
arsenm wrote: > @aeubanks @arsenm after looking into this in more detail, I realized that the > `getContext` method of `MMI` is heavily used in the `AsmPrinter` to create > symbols. Also not having it makes it harder for the `MMI` to create machine > functions using `getOrCreateMachineFunction`. The AsmPrinter is just an ordinary ModulePass. The initialization can just set a MMI member? https://github.com/llvm/llvm-project/pull/105541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Support] Add scaling support in `indent` (PR #109478)
@@ -774,18 +774,27 @@ class buffer_unique_ostream : public raw_svector_ostream { // you can use // OS << indent(6) << "more stuff"; // which has better ergonomics (and clang-formats better as well). +// +// If indentation is always in increments of a fixed value, you can use Scale +// to set that value once. So indent(1, 2) will add 2 spaces and +// indent(1,2) + 1 will add 4 spaces. struct indent { - unsigned NumSpaces; - - explicit indent(unsigned NumSpaces) : NumSpaces(NumSpaces) {} - void operator+=(unsigned N) { NumSpaces += N; } - void operator-=(unsigned N) { NumSpaces -= N; } - indent operator+(unsigned N) const { return indent(NumSpaces + N); } - indent operator-(unsigned N) const { return indent(NumSpaces - N); } + // Indentation is represented as `NumIndents` steps of size `Scale` each. + unsigned NumIndents; + unsigned Scale; + + explicit indent(unsigned NumIndents, unsigned Scale = 1) + : NumIndents(NumIndents), Scale(Scale) {} + + // These arithmeric operators preserve scale. + void operator+=(unsigned N) { NumIndents += N; } + void operator-=(unsigned N) { NumIndents -= N; } + indent operator+(unsigned N) const { return indent(NumIndents + N, Scale); } + indent operator-(unsigned N) const { return indent(NumIndents - N, Scale); } arsenm wrote: I'm surprised there's no guard against underflow here https://github.com/llvm/llvm-project/pull/109478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Support] Add scaling support in `indent` (PR #109478)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/109478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [mlir] Make MMIWP not have ownership over MMI + Make MMI Only Use an External MCContext (PR #105541)
arsenm wrote: > @aeubanks It's not impossible to separate them completely. `MCContext` is > needed during initialization and finalization of the > `MachineModuleInfoWrapperPass` (and its new pass manager variant) to set the > diagnostics handler. > > In theory, you can just pass the context to the wrapper pass instead. @arsenm > any thoughts on this? The MachineModuleInfo is the container for all the MachineFunctions (which do hold a reference to the MCContext), so it kind of makes sense to keep it there. But it does look like it should be simple to remove the reference here. So I would say it's better to just remove it https://github.com/llvm/llvm-project/pull/105541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lldb] [llvm] [mlir] [APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (PR #80309)
@@ -1806,7 +1806,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, // instructions to perform VALU adds with immediates or inline literals. unsigned NumLiterals = !TII->isInlineConstant(APInt(32, COffsetVal & 0x)) + - !TII->isInlineConstant(APInt(32, COffsetVal >> 32)); + !TII->isInlineConstant(APInt(32, uint64_t(COffsetVal) >> 32)); arsenm wrote: These should probably just use Lo_32/Hi_32 https://github.com/llvm/llvm-project/pull/80309 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lldb] [llvm] [mlir] [APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (PR #80309)
@@ -4377,7 +4377,7 @@ AMDGPUInstructionSelector::selectGlobalSAddr(MachineOperand &Root) const { // instructions to perform VALU adds with immediates or inline literals. unsigned NumLiterals = !TII.isInlineConstant(APInt(32, ConstOffset & 0x)) + -!TII.isInlineConstant(APInt(32, ConstOffset >> 32)); +!TII.isInlineConstant(APInt(32, uint64_t(ConstOffset) >> 32)); arsenm wrote: These should probably just use Lo_32/Hi_32 https://github.com/llvm/llvm-project/pull/80309 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)
@@ -690,23 +690,46 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, const CallExpr *E, llvm::Constant *calleeValue) { CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E); CGCallee callee = CGCallee::forDirect(calleeValue, GlobalDecl(FD)); + llvm::CallBase *callOrInvoke = nullptr; + CGFunctionInfo const *FnInfo = nullptr; RValue Call = - CGF.EmitCall(E->getCallee()->getType(), callee, E, ReturnValueSlot()); + CGF.EmitCall(E->getCallee()->getType(), callee, E, ReturnValueSlot(), + /*Chain=*/nullptr, &callOrInvoke, &FnInfo); if (unsigned BuiltinID = FD->getBuiltinID()) { // Check whether a FP math builtin function, such as BI__builtin_expf ASTContext &Context = CGF.getContext(); bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); + +auto isDirectOrIgnore = [&](ABIArgInfo const &info) { + // For a non-aggregate types direct/extend means the type will be used + // directly (or a sign/zero extension of it) on the call (not a + // input/output pointer). + return info.isDirect() || info.isExtend() || info.isIgnore(); +}; + +// Before annotating this libcall with "int" TBAA metadata check all +// arguments/results are passed directly. On some targets, types such as +// "long double" are passed indirectly via a pointer, and annotating the +// call with "int" TBAA metadata will lead to set up for those arguments +// being incorrectly optimized out. +bool ReturnAndAllArgumentsDirect = +isDirectOrIgnore(FnInfo->getReturnInfo()) && +llvm::all_of(FnInfo->arguments(), + [&](CGFunctionInfoArgInfo const &ArgInfo) { + return isDirectOrIgnore(ArgInfo.info); + }); arsenm wrote: Make a predicate function and call at the end of the if expression below https://github.com/llvm/llvm-project/pull/108853 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -678,6 +690,37 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this, UsedAssumedInformation); } + + // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is + // not to be set. + bool needFlatScratchInit(Attributor &A) { +assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still set + +// This is called on each callee; false means callee shouldn't have +// no-flat-scratch-init. +auto CheckForNoFlatScratchInit = [&](Instruction &I) { + const auto &CB = cast(I); arsenm wrote: I would hope FroAllCallLikeInstructions would have a CallBase typed argument to begin with https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -678,6 +690,37 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this, UsedAssumedInformation); } + + // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is + // not to be set. + bool needFlatScratchInit(Attributor &A) { +assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still set + +// This is called on each callee; false means callee shouldn't have +// no-flat-scratch-init. +auto CheckForNoFlatScratchInit = [&](Instruction &I) { + const auto &CB = cast(I); + const Function *Callee = CB.getCalledFunction(); + + // Callee == 0 for inline asm or indirect call with known callees. + // In the latter case, updateImpl() already checked the callees and we + // know their FLAT_SCRATCH_INIT bit is set. + // If function has indirect call with unknown callees, the bit is + // already removed in updateImpl() and execution won't reach here. + if (!Callee) +return true; + else arsenm wrote: No else after return https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)
@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes { indicatePessimisticFixpoint(); return; } + +for (Instruction &I : instructions(F)) { + if (isa(I) && arsenm wrote: For a nightmare of an edge case, addrspacecasts from private to flat can exist somewhere in constant expressions. For now, as long as addrspace(5) globals are forbidden, this would only be valid with literal addresses. I'm not sure how defined we should consider that case. But if you follow along with the queue pointer handling, it will work. It already has to handle the 3->0 case in constant expressions https://github.com/llvm/llvm-project/pull/94647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)
arsenm wrote: Replaced by #108596 https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
@@ -40,12 +42,19 @@ class AMDGPUAsmPrinter final : public AsmPrinter { AMDGPUResourceUsageAnalysis *ResourceUsage; + MCResourceInfo RI; + SIProgramInfo CurrentProgramInfo; std::unique_ptr HSAMetadataStream; MCCodeEmitter *DumpCodeInstEmitter = nullptr; + // validateMCResourceInfo cannot recompute parts of the occupancy as it does + // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if we + // really want to track and validate the occupancy. arsenm wrote: I don't see why this is the case https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
@@ -0,0 +1,225 @@ +//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +/// \file +/// \brief MC infrastructure to propagate the function level resource usage +/// info. +/// +//===--===// + +#include "AMDGPUMCResourceInfo.h" +#include "Utils/AMDGPUBaseInfo.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/StringRef.h" +#include "llvm/MC/MCContext.h" +#include "llvm/MC/MCSymbol.h" + +using namespace llvm; + +MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK, +MCContext &OutContext) { + auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) { +return OutContext.getOrCreateSymbol(FuncName + Twine(Suffix)); + }; + switch (RIK) { + case RIK_NumVGPR: +return GOCS(".num_vgpr"); + case RIK_NumAGPR: +return GOCS(".num_agpr"); + case RIK_NumSGPR: +return GOCS(".numbered_sgpr"); + case RIK_PrivateSegSize: +return GOCS(".private_seg_size"); + case RIK_UsesVCC: +return GOCS(".uses_vcc"); + case RIK_UsesFlatScratch: +return GOCS(".uses_flat_scratch"); + case RIK_HasDynSizedStack: +return GOCS(".has_dyn_sized_stack"); + case RIK_HasRecursion: +return GOCS(".has_recursion"); + case RIK_HasIndirectCall: +return GOCS(".has_indirect_call"); + } + llvm_unreachable("Unexpected ResourceInfoKind."); +} + +const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName, +ResourceInfoKind RIK, +MCContext &Ctx) { + return MCSymbolRefExpr::create(getSymbol(FuncName, RIK, Ctx), Ctx); +} + +void MCResourceInfo::assignMaxRegs(MCContext &OutContext) { + // Assign expression to get the max register use to the max_num_Xgpr symbol. + MCSymbol *MaxVGPRSym = getMaxVGPRSymbol(OutContext); + MCSymbol *MaxAGPRSym = getMaxAGPRSymbol(OutContext); + MCSymbol *MaxSGPRSym = getMaxSGPRSymbol(OutContext); + + auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) { +const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext); +Sym->setVariableValue(MaxExpr); + }; + + assignMaxRegSym(MaxVGPRSym, MaxVGPR); + assignMaxRegSym(MaxAGPRSym, MaxAGPR); + assignMaxRegSym(MaxSGPRSym, MaxSGPR); +} + +void MCResourceInfo::finalize(MCContext &OutContext) { + assert(!Finalized && "Cannot finalize ResourceInfo again."); + Finalized = true; + assignMaxRegs(OutContext); +} + +MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_vgpr"); +} + +MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_agpr"); +} + +MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_sgpr"); +} + +void MCResourceInfo::assignResourceInfoExpr( +int64_t LocalValue, ResourceInfoKind RIK, AMDGPUMCExpr::VariantKind Kind, +const MachineFunction &MF, const SmallVectorImpl &Callees, +MCContext &OutContext) { + const MCConstantExpr *LocalConstExpr = + MCConstantExpr::create(LocalValue, OutContext); + const MCExpr *SymVal = LocalConstExpr; + if (!Callees.empty()) { +SmallVector ArgExprs; +// Avoid recursive symbol assignment. +SmallPtrSet Seen; +ArgExprs.push_back(LocalConstExpr); +const Function &F = MF.getFunction(); +Seen.insert(&F); + +for (const Function *Callee : Callees) { + if (Seen.contains(Callee)) +continue; + Seen.insert(Callee); arsenm wrote: Should combine these by seeing if the insert succeeded https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)
@@ -0,0 +1,225 @@ +//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +/// \file +/// \brief MC infrastructure to propagate the function level resource usage +/// info. +/// +//===--===// + +#include "AMDGPUMCResourceInfo.h" +#include "Utils/AMDGPUBaseInfo.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/StringRef.h" +#include "llvm/MC/MCContext.h" +#include "llvm/MC/MCSymbol.h" + +using namespace llvm; + +MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK, +MCContext &OutContext) { + auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) { +return OutContext.getOrCreateSymbol(FuncName + Twine(Suffix)); + }; + switch (RIK) { + case RIK_NumVGPR: +return GOCS(".num_vgpr"); + case RIK_NumAGPR: +return GOCS(".num_agpr"); + case RIK_NumSGPR: +return GOCS(".numbered_sgpr"); + case RIK_PrivateSegSize: +return GOCS(".private_seg_size"); + case RIK_UsesVCC: +return GOCS(".uses_vcc"); + case RIK_UsesFlatScratch: +return GOCS(".uses_flat_scratch"); + case RIK_HasDynSizedStack: +return GOCS(".has_dyn_sized_stack"); + case RIK_HasRecursion: +return GOCS(".has_recursion"); + case RIK_HasIndirectCall: +return GOCS(".has_indirect_call"); + } + llvm_unreachable("Unexpected ResourceInfoKind."); +} + +const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName, +ResourceInfoKind RIK, +MCContext &Ctx) { + return MCSymbolRefExpr::create(getSymbol(FuncName, RIK, Ctx), Ctx); +} + +void MCResourceInfo::assignMaxRegs(MCContext &OutContext) { + // Assign expression to get the max register use to the max_num_Xgpr symbol. + MCSymbol *MaxVGPRSym = getMaxVGPRSymbol(OutContext); + MCSymbol *MaxAGPRSym = getMaxAGPRSymbol(OutContext); + MCSymbol *MaxSGPRSym = getMaxSGPRSymbol(OutContext); + + auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) { +const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext); +Sym->setVariableValue(MaxExpr); + }; + + assignMaxRegSym(MaxVGPRSym, MaxVGPR); + assignMaxRegSym(MaxAGPRSym, MaxAGPR); + assignMaxRegSym(MaxSGPRSym, MaxSGPR); +} + +void MCResourceInfo::finalize(MCContext &OutContext) { + assert(!Finalized && "Cannot finalize ResourceInfo again."); + Finalized = true; + assignMaxRegs(OutContext); +} + +MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_vgpr"); +} + +MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_agpr"); +} + +MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) { + return OutContext.getOrCreateSymbol("max_num_sgpr"); +} arsenm wrote: I wonder if these should be placed in a custom section. In any case, we will eventually need custom linker logic to deal with this https://github.com/llvm/llvm-project/pull/102913 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -0,0 +1,31 @@ +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple x86_64-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple x86_64-pc-win64 -o - %s | FileCheck %s -check-prefixes=CHECK +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple i686-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple powerpc-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK-PPC +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple armv7-none-linux-gnueabi -o - %s | FileCheck %s -check-prefixes=CHECK-TBAA,TBAA +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple armv7-none-linux-gnueabihf -o - %s | FileCheck %s -check-prefixes=CHECK-ARM,TBAA +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple thumbv7k-apple-watchos2.0 -o - -target-abi aapcs16 %s | FileCheck %s -check-prefixes=CHECK-THUMB,TBAA +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple aarch64-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK-AARCH,TBAA +// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple spir -o - %s | FileCheck %s -check-prefixes=CHECK-SPIR + +_Complex long double foo() { + _Complex long double cld; + long double v2 = __builtin_cargl(cld); + _Complex long double tmp = v2 * cld; + return tmp; +} +// CHECK: tail call x86_fp80 @cargl(ptr noundef nonnull byval({ {{.*}}, {{.*}} }) arsenm wrote: Should also test a case with a complex return type, and 0 complex arguments? Is there such a builtin? https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -686,6 +686,20 @@ static Value *EmitSignBit(CodeGenFunction &CGF, Value *V) { return CGF.Builder.CreateICmpSLT(V, Zero); } +static bool hasPointerArgsOrPointerReturnType(const Value *V) { + if (const CallBase *CB = dyn_cast(V)) { +for (const Value *A : CB->args()) { arsenm wrote: could use any_of https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -699,9 +713,12 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); // Restrict to target with errno, for example, MacOS doesn't set errno. -// TODO: Support builtin function with complex type returned, eg: cacosh +bool CallWithPointerArgsOrPointerReturnType = +Call.isScalar() && Call.getScalarVal() && +hasPointerArgsOrPointerReturnType(Call.getScalarVal()); arsenm wrote: Predicate function called in the if? https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -686,6 +686,20 @@ static Value *EmitSignBit(CodeGenFunction &CGF, Value *V) { return CGF.Builder.CreateICmpSLT(V, Zero); } +static bool hasPointerArgsOrPointerReturnType(const Value *V) { + if (const CallBase *CB = dyn_cast(V)) { +for (const Value *A : CB->args()) { + if (A->getType()->isPointerTy()) { +return true; + } +} +if (CB->getFunctionType()->getReturnType()->isPointerTy()) { + return true; +} arsenm wrote: Don't need braces https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)
@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics getMemSemantics(AtomicOrdering Ord) { llvm_unreachable(nullptr); } +SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) { + SmallVector SSNs; + Ctx.getSyncScopeNames(SSNs); + + StringRef MemScope = SSNs[ID]; + if (MemScope.empty() || MemScope == "all_svm_devices") arsenm wrote: Hard disagree, we do not want aliases. System = "" = all_svm_devices https://github.com/llvm/llvm-project/pull/106429 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); // Restrict to target with errno, for example, MacOS doesn't set errno. -// TODO: Support builtin function with complex type returned, eg: cacosh +bool CallWithPointerArgsOrPointerReturnType = false; +if (Call.isScalar() && Call.getScalarVal()) { + if (CallBase *CB = dyn_cast(Call.getScalarVal())) { +for (Value *A : CB->args()) + if (A->getType()->isPointerTy()) +CallWithPointerArgsOrPointerReturnType = true; +CallWithPointerArgsOrPointerReturnType = +CallWithPointerArgsOrPointerReturnType || +CB->getFunctionType()->getReturnType()->isPointerTy(); + } +} arsenm wrote: Turn this into a predicate function https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); // Restrict to target with errno, for example, MacOS doesn't set errno. -// TODO: Support builtin function with complex type returned, eg: cacosh +bool CallWithPointerArgsOrPointerReturnType = false; +if (Call.isScalar() && Call.getScalarVal()) { + if (CallBase *CB = dyn_cast(Call.getScalarVal())) { +for (Value *A : CB->args()) + if (A->getType()->isPointerTy()) +CallWithPointerArgsOrPointerReturnType = true; arsenm wrote: Should probably be looking at the source signature, not the IR? https://github.com/llvm/llvm-project/pull/107598 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)
@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics getMemSemantics(AtomicOrdering Ord) { llvm_unreachable(nullptr); } +SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) { + SmallVector SSNs; + Ctx.getSyncScopeNames(SSNs); + + StringRef MemScope = SSNs[ID]; + if (MemScope.empty() || MemScope == "all_svm_devices") arsenm wrote: Just avoid all_svm_devices altogether? It's the same as the default / "" https://github.com/llvm/llvm-project/pull/106429 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)
@@ -58,7 +58,35 @@ class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo { SPIRVTargetCodeGenInfo(CodeGen::CodeGenTypes &CGT) : CommonSPIRTargetCodeGenInfo(std::make_unique(CGT)) {} void setCUDAKernelCallingConvention(const FunctionType *&FT) const override; + llvm::SyncScope::ID getLLVMSyncScopeID(const LangOptions &LangOpts, + SyncScope Scope, + llvm::AtomicOrdering Ordering, + llvm::LLVMContext &Ctx) const override; }; + +inline StringRef mapClangSyncScopeToLLVM(SyncScope Scope) { + switch (Scope) { + case SyncScope::HIPSingleThread: + case SyncScope::SingleScope: +return "singlethread"; + case SyncScope::HIPWavefront: + case SyncScope::OpenCLSubGroup: + case SyncScope::WavefrontScope: +return "subgroup"; + case SyncScope::HIPWorkgroup: + case SyncScope::OpenCLWorkGroup: + case SyncScope::WorkgroupScope: +return "workgroup"; + case SyncScope::HIPAgent: + case SyncScope::OpenCLDevice: + case SyncScope::DeviceScope: +return "device"; + case SyncScope::SystemScope: + case SyncScope::HIPSystem: + case SyncScope::OpenCLAllSVMDevices: +return "all_svm_devices"; arsenm wrote: On the naming point, this preferably would use the names directly from the SPIRV spec https://github.com/llvm/llvm-project/pull/106429 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)
@@ -766,8 +766,19 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest, // LLVM atomic instructions always have synch scope. If clang atomic // expression has no scope operand, use default LLVM synch scope. if (!ScopeModel) { +llvm::SyncScope::ID SS; +if (CGF.getLangOpts().OpenCL) + // OpenCL approach is: "The functions that do not have memory_scope + // argument have the same semantics as the corresponding functions with + // the memory_scope argument set to memory_scope_device." See ref.: + // https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions + SS = CGF.getTargetHooks().getLLVMSyncScopeID(CGF.getLangOpts(), + SyncScope::OpenCLDevice, + Order, CGF.getLLVMContext()); +else + SS = CGF.getLLVMContext().getOrInsertSyncScopeID(""); arsenm wrote: Don't need to query this, this can just be llvm::SyncScope::System https://github.com/llvm/llvm-project/pull/106429 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)
arsenm wrote: > Hi, @paulwalker-arm, ACLE allows users to do instruction-level development, > but mixing intrinsic and regular C code may break some of the rules set by > the compiler. The rules are still there. You can always use a union or copy to avoid violating the rules. I don't think it makes sense to special case any intrinsics https://github.com/llvm/llvm-project/pull/107793 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [TableGen] Change SetTheory set/vec to use const Record * (PR #107692)
https://github.com/arsenm approved this pull request. lgtm assuming the const_cast goes away in a subsequent change https://github.com/llvm/llvm-project/pull/107692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)
arsenm wrote: I don't understand this. The code is a strict aliasing violation, so why should clang work around it? https://github.com/llvm/llvm-project/pull/107793 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Remove 3-element vector load and store special handling (PR #104661)
@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) { *p = (packedfloat3) { 3.2f, 2.3f, 0.1f }; } // CHECK: @test3( -// CHECK: store <4 x float> {{.*}}, align 4 +// CHECK: store <3 x float> {{.*}}, align 4 arsenm wrote: I'd expect this to be in terms of type, not size but yes. https://github.com/llvm/llvm-project/pull/104661 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] a291fe5 - clang/AMDGPU: Update test message order
Author: Matt Arsenault Date: 2024-09-06T21:18:41+04:00 New Revision: a291fe5ed44fa37493d038c78ff4d73135fd85a9 URL: https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9 DIFF: https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9.diff LOG: clang/AMDGPU: Update test message order Order of atomic expansion remarks is backwards since 100d9b89947bb1d42af20010bb594fa4c02542fc Added: Modified: clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl Removed: diff --git a/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl index d23005e018f359..72027eda4571da 100644 --- a/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl +++ b/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl @@ -26,10 +26,11 @@ typedef enum memory_scope { #endif } memory_scope; -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] // REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at wavefront-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] + // GFX90A-CAS-LABEL: @atomic_cas // GFX90A-CAS: atomicrmw fadd ptr addrspace(1) {{.*}} syncscope("workgroup-one-as") monotonic // GFX90A-CAS: atomicrmw fadd ptr addrspace(1) {{.*}} syncscope("agent-one-as") monotonic diff --git a/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl index 80ad9b4df8f64f..7d684bc185a58d 100644 --- a/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl +++ b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl @@ -27,9 +27,10 @@ typedef enum memory_scope { #endif } memory_scope; -// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope workgroup-one-as due to an unsafe request. [-Rpass=si-lower] -// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope agent-one-as due to an unsafe request. [-Rpass=si-lower] // GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope wavefront-one-as due to an unsafe request. [-Rpass=si-lower] +// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope agent-one-as due to an unsafe request. [-Rpass=si-lower] +// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope workgroup-one-as due to an unsafe request. [-Rpass=si-lower] + // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
arsenm wrote: > > The vector tests should still be added > > sorry. if i remove the change of the vector. i have to remove the testcase. > because, for the current code convert between vector type of half and > bfloat16, it has a bug. And it will be Assert "Invalid cast!"" > OK, LGTM with the else before return fixed. Can you handle the vector case in a follow up? https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)
@@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck --check-prefix=GCN %s +; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck --check-prefix=GCN %s + +declare void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) %rsrc, i32 %offset, i32 %len) + +define amdgpu_ps void @buffer_prefetch_data_imm_offset_sgpr_len(ptr addrspace(8) inreg %rsrc, i32 inreg %len) { +; GCN-LABEL: buffer_prefetch_data_imm_offset_sgpr_len: +; GCN: ; %bb.0: ; %entry +; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x80, s4, 0 +; GCN-NEXT:s_endpgm +entry: + tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg %rsrc, i32 128, i32 %len) + ret void +} + +define amdgpu_ps void @buffer_prefetch_data_imm_offset_imm_len(ptr addrspace(8) inreg %rsrc) { +; GCN-LABEL: buffer_prefetch_data_imm_offset_imm_len: +; GCN: ; %bb.0: ; %entry +; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x0, null, 31 +; GCN-NEXT:s_endpgm +entry: + tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg %rsrc, i32 0, i32 31) + ret void +} + +define amdgpu_ps void @buffer_prefetch_data_imm_offset_vgpr_len(ptr addrspace(8) inreg %rsrc, i32 %len) { +; GCN-LABEL: buffer_prefetch_data_imm_offset_vgpr_len: +; GCN: ; %bb.0: ; %entry +; GCN-NEXT:v_readfirstlane_b32 s4, v0 +; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x80, s4, 0 +; GCN-NEXT:s_endpgm +entry: + tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg %rsrc, i32 128, i32 %len) + ret void +} arsenm wrote: Test the behavior with VGPR rsrc https://github.com/llvm/llvm-project/pull/107293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)
https://github.com/arsenm approved this pull request. I think the parent needs some revision for global/flat/infer handling https://github.com/llvm/llvm-project/pull/107293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops); return SDValue(NewMI, 0); } + case Intrinsic::amdgcn_s_prefetch_data: { +// For non-global address space preserve the chain and remove the call. +if (!AMDGPU::isFlatGlobalAddrSpace(cast(Op)->getAddressSpace())) + return Op.getOperand(0); +return Op; arsenm wrote: Infer can just not do the change if the resulting address space isn't global https://github.com/llvm/llvm-project/pull/107293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops); return SDValue(NewMI, 0); } + case Intrinsic::amdgcn_s_prefetch_data: { +// For non-global address space preserve the chain and remove the call. +if (!AMDGPU::isFlatGlobalAddrSpace(cast(Op)->getAddressSpace())) + return Op.getOperand(0); +return Op; arsenm wrote: I'd expect the private/local cases to be an error. Also collectFlatAddressOperands should handle this, to get the flat->global optimization in InferAddressSpaces https://github.com/llvm/llvm-project/pull/107293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
https://github.com/arsenm commented: The vector tests should still be added https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
@@ -1431,9 +1431,13 @@ Value *ScalarExprEmitter::EmitScalarCast(Value *Src, QualType SrcType, return Builder.CreateFPToUI(Src, DstTy, "conv"); } - if (DstElementTy->getTypeID() < SrcElementTy->getTypeID()) + if ((DstElementTy->is16bitFPTy() && SrcElementTy->is16bitFPTy())) { +Value *FloatVal = Builder.CreateFPExt(Src, Builder.getFloatTy(), "conv"); +return Builder.CreateFPTrunc(FloatVal, DstTy, "conv"); + } else if (DstElementTy->getTypeID() < SrcElementTy->getTypeID()) arsenm wrote: Still needs to be resolved. Also should probably get a comment https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)
arsenm wrote: > ok, you mean, i remove the vector testcase for this patch. and just save the > scalar testcase? No, keep the tests. Only keep the scalar behavior change. The previous revision was essentially correct and minimal https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits